diff --git "a/wandb/run-20220505_173748-b097rk18/files/output.log" "b/wandb/run-20220505_173748-b097rk18/files/output.log" --- "a/wandb/run-20220505_173748-b097rk18/files/output.log" +++ "b/wandb/run-20220505_173748-b097rk18/files/output.log" @@ -36731,3 +36731,1944 @@ Model weights saved in ./checkpoint-4500/pytorch_model.bin███████ Feature extractor saved in ./preprocessor_config.jsonl.bin███████████████▌ | 4500/4860 [18:00:01<1:07:51, 11.31s/it]Saving model checkpoint to ./checkpoint-4500 don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Adding files tracked by Git LFS: ['wandb/run-20220505_173748-b097rk18/logs/debug-internal.log']. This may take a bit of time if the files are large./checkpoint-4500 don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. Adding files tracked by Git LFS: ['wandb/run-20220505_173748-b097rk18/logs/debug-internal.log']. This may take a bit of time if the files are large./checkpoint-4500 don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...log']. This may take a bit of time if the files are large./checkpoint-4500 don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computeds may take a bit of time if the files are large./checkpoint-4500 don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computeds may take a bit of time if the files are large./checkpoint-4500 don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computeds may take a bit of time if the files are large./checkpoint-4500 don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computeds may take a bit of time if the files are large./checkpoint-4500 don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computeds may take a bit of time if the files are large./checkpoint-4500 don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.2231, 'learning_rate': 2.525229357798165e-06, 'epoch': 2.78} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computeds may take a bit of time if the files are large./checkpoint-4500 don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computeds may take a bit of time if the files are large./checkpoint-4500 don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computeds may take a bit of time if the files are large./checkpoint-4500 don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computeds may take a bit of time if the files are large./checkpoint-4500 don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computeds may take a bit of time if the files are large./checkpoint-4500 don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computeds may take a bit of time if the files are large./checkpoint-4500 don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.2482, 'learning_rate': 2.518348623853211e-06, 'epoch': 2.78} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computeds may take a bit of time if the files are large./checkpoint-4500 don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computeds may take a bit of time if the files are large./checkpoint-4500 don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computeds may take a bit of time if the files are large./checkpoint-4500 don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computeds may take a bit of time if the files are large./checkpoint-4500 don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computeds may take a bit of time if the files are large./checkpoint-4500 don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computeds may take a bit of time if the files are large./checkpoint-4500 don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.1241, 'learning_rate': 2.511467889908257e-06, 'epoch': 2.78} +Could not estimate the number of tokens of the input, floating-point operations will not be computeds may take a bit of time if the files are large./checkpoint-4500 don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computeds may take a bit of time if the files are large./checkpoint-4500 don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computeds may take a bit of time if the files are large./checkpoint-4500 don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computeds may take a bit of time if the files are large./checkpoint-4500 don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computeds may take a bit of time if the files are large./checkpoint-4500 don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.1448, 'learning_rate': 2.504587155963303e-06, 'epoch': 2.78} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computeds may take a bit of time if the files are large./checkpoint-4500 don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computeds may take a bit of time if the files are large./checkpoint-4500 don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computeds may take a bit of time if the files are large./checkpoint-4500 don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computeds may take a bit of time if the files are large./checkpoint-4500 don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computeds may take a bit of time if the files are large./checkpoint-4500 don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computeds may take a bit of time if the files are large./checkpoint-4500 don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.2204, 'learning_rate': 2.4977064220183485e-06, 'epoch': 2.78} +Could not estimate the number of tokens of the input, floating-point operations will not be computeds may take a bit of time if the files are large./checkpoint-4500 don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computeds may take a bit of time if the files are large./checkpoint-4500 don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computeds may take a bit of time if the files are large./checkpoint-4500 don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computeds may take a bit of time if the files are large./checkpoint-4500 don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computeds may take a bit of time if the files are large./checkpoint-4500 don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computeds may take a bit of time if the files are large./checkpoint-4500 don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.2021, 'learning_rate': 2.4908256880733945e-06, 'epoch': 2.78} +Could not estimate the number of tokens of the input, floating-point operations will not be computeds may take a bit of time if the files are large./checkpoint-4500 don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computeds may take a bit of time if the files are large./checkpoint-4500 don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computeds may take a bit of time if the files are large./checkpoint-4500 don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computeds may take a bit of time if the files are large./checkpoint-4500 don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computeds may take a bit of time if the files are large./checkpoint-4500 don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.0509, 'learning_rate': 2.4839449541284405e-06, 'epoch': 2.78} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computeds may take a bit of time if the files are large./checkpoint-4500 don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computeds may take a bit of time if the files are large./checkpoint-4500 don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computeds may take a bit of time if the files are large./checkpoint-4500 don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computeds may take a bit of time if the files are large./checkpoint-4500 don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computeds may take a bit of time if the files are large./checkpoint-4500 don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.1017, 'learning_rate': 2.4770642201834866e-06, 'epoch': 2.78} +Could not estimate the number of tokens of the input, floating-point operations will not be computeds may take a bit of time if the files are large./checkpoint-4500 don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computeds may take a bit of time if the files are large./checkpoint-4500 don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computeds may take a bit of time if the files are large./checkpoint-4500 don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computeds may take a bit of time if the files are large./checkpoint-4500 don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computeds may take a bit of time if the files are large./checkpoint-4500 don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.0809, 'learning_rate': 2.4633027522935782e-06, 'epoch': 2.78} +Could not estimate the number of tokens of the input, floating-point operations will not be computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.8768, 'learning_rate': 2.456422018348624e-06, 'epoch': 2.78} +Could not estimate the number of tokens of the input, floating-point operations will not be computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.0716, 'learning_rate': 2.4426605504587155e-06, 'epoch': 2.79} +Could not estimate the number of tokens of the input, floating-point operations will not be computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.8871, 'learning_rate': 2.4357798165137616e-06, 'epoch': 2.79} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.9398, 'learning_rate': 2.428899082568807e-06, 'epoch': 2.79} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.0203, 'learning_rate': 2.4220183486238532e-06, 'epoch': 2.79} +Could not estimate the number of tokens of the input, floating-point operations will not be computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.8563, 'learning_rate': 2.4082568807339453e-06, 'epoch': 2.79} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.0022, 'learning_rate': 2.401376146788991e-06, 'epoch': 2.79} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.8043, 'learning_rate': 2.394495412844037e-06, 'epoch': 2.79} +Could not estimate the number of tokens of the input, floating-point operations will not be computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.9046, 'learning_rate': 2.3876146788990826e-06, 'epoch': 2.79} +Could not estimate the number of tokens of the input, floating-point operations will not be computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.881, 'learning_rate': 2.3807339449541286e-06, 'epoch': 2.79} +Could not estimate the number of tokens of the input, floating-point operations will not be computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.9277, 'learning_rate': 2.3738532110091743e-06, 'epoch': 2.79} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computeds may take a bit of `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|██████████████████████████████████████████████████████████████████████▋ | 4524/4860 [18:05:53<44:44, 7.99s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|██████████████████████████████████████████████████████████████████████▋ | 4524/4860 [18:05:53<44:44, 7.99s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.8105, 'learning_rate': 2.3669724770642203e-06, 'epoch': 2.79} +Could not estimate the number of tokens of the input, floating-point operations will not be computed53<44:44, 7.99s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed53<44:44, 7.99s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed53<44:44, 7.99s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed53<44:44, 7.99s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.7087, 'learning_rate': 2.360091743119266e-06, 'epoch': 2.79} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed53<44:44, 7.99s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed53<44:44, 7.99s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed53<44:44, 7.99s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed53<44:44, 7.99s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed53<44:44, 7.99s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed53<44:44, 7.99s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|██████████████████████████████████████████████████████████████████████▊ | 4527/4860 [18:06:15<41:59, 7.56s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|██████████████████████████████████████████████████████████████████████▊ | 4527/4860 [18:06:15<41:59, 7.56s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.6716, 'learning_rate': 2.3463302752293576e-06, 'epoch': 2.79} +Could not estimate the number of tokens of the input, floating-point operations will not be computed15<41:59, 7.56s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed15<41:59, 7.56s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15<41:59, 7.56s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15<41:59, 7.56s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed15<41:59, 7.56s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15<41:59, 7.56s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15<41:59, 7.56s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15<41:59, 7.56s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed15<41:59, 7.56s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15<41:59, 7.56s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed15<41:59, 7.56s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed15<41:59, 7.56s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15<41:59, 7.56s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed15<41:59, 7.56s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15<41:59, 7.56s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed15<41:59, 7.56s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed15<41:59, 7.56s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15<41:59, 7.56s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed15<41:59, 7.56s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15<41:59, 7.56s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15<41:59, 7.56s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed15<41:59, 7.56s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15<41:59, 7.56s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed15<41:59, 7.56s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15<41:59, 7.56s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15<41:59, 7.56s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed15<41:59, 7.56s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15<41:59, 7.56s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15<41:59, 7.56s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15<41:59, 7.56s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed15<41:59, 7.56s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15<41:59, 7.56s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed15<41:59, 7.56s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed15<41:59, 7.56s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15<41:59, 7.56s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed15<41:59, 7.56s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|██████████████████████████████████████████████████████████████████████▉ | 4536/4860 [18:07:15<34:33, 6.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|██████████████████████████████████████████████████████████████████████▉ | 4536/4860 [18:07:15<34:33, 6.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15<34:33, 6.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed15<34:33, 6.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15<34:33, 6.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15<34:33, 6.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed15<34:33, 6.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed15<34:33, 6.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15<34:33, 6.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15<34:33, 6.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed15<34:33, 6.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed15<34:33, 6.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed15<34:33, 6.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15<34:33, 6.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed15<34:33, 6.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed15<34:33, 6.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed15<34:33, 6.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.3377, 'learning_rate': 2.2568807339449544e-06, 'epoch': 2.8} +Could not estimate the number of tokens of the input, floating-point operations will not be computed15<34:33, 6.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15<34:33, 6.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15<34:33, 6.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed15<34:33, 6.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed15<34:33, 6.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15<34:33, 6.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15<34:33, 6.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15<34:33, 6.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed15<34:33, 6.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed15<34:33, 6.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15<34:33, 6.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15<34:33, 6.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed15<34:33, 6.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15<34:33, 6.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15<34:33, 6.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed15<34:33, 6.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15<34:33, 6.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed15<34:33, 6.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15<34:33, 6.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed15<34:33, 6.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15<34:33, 6.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed15<34:33, 6.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed15<34:33, 6.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15<34:33, 6.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed15<34:33, 6.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15<34:33, 6.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed15<34:33, 6.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15<34:33, 6.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed15<34:33, 6.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed15<34:33, 6.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|█████████████████████████████████████████████████████████████████████▏ | 4544/4860 [18:08:33<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [18:08:33<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [18:08:33<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.1633, 'learning_rate': 2.208715596330275e-06, 'epoch': 2.81} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.1374, 'learning_rate': 2.201834862385321e-06, 'epoch': 2.81} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.1352, 'learning_rate': 2.194954128440367e-06, 'epoch': 2.81} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.0635, 'learning_rate': 2.188073394495413e-06, 'epoch': 2.81} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.1164, 'learning_rate': 2.1811926605504588e-06, 'epoch': 2.81} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.2743, 'learning_rate': 2.174311926605505e-06, 'epoch': 2.81} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.9629, 'learning_rate': 2.1674311926605504e-06, 'epoch': 2.81} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.1459, 'learning_rate': 2.1605504587155965e-06, 'epoch': 2.81} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.1064, 'learning_rate': 2.13302752293578e-06, 'epoch': 2.81} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.8197, 'learning_rate': 2.119266055045872e-06, 'epoch': 2.81} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.0454, 'learning_rate': 2.0917431192660552e-06, 'epoch': 2.82} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.0077, 'learning_rate': 2.0711009174311925e-06, 'epoch': 2.82} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.9185, 'learning_rate': 2.0642201834862385e-06, 'epoch': 2.82} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.058, 'learning_rate': 2.057339449541284e-06, 'epoch': 2.82} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.8024, 'learning_rate': 2.0504587155963306e-06, 'epoch': 2.82} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.0106, 'learning_rate': 2.0435779816513762e-06, 'epoch': 2.82} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.6412, 'learning_rate': 2.029816513761468e-06, 'epoch': 2.82} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.881, 'learning_rate': 2.022935779816514e-06, 'epoch': 2.82} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.8917, 'learning_rate': 2.0160550458715596e-06, 'epoch': 2.82} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.696, 'learning_rate': 2.0022935779816512e-06, 'epoch': 2.83} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:01:09, 11.61s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|███████████████████████████████████████████████████████████████████████▌ | 4578/4860 [18:13:48<34:54, 7.43s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|███████████████████████████████████████████████████████████████████████▌ | 4578/4860 [18:13:48<34:54, 7.43s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.8032, 'learning_rate': 1.9954128440366973e-06, 'epoch': 2.83} +Could not estimate the number of tokens of the input, floating-point operations will not be computed48<34:54, 7.43s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed48<34:54, 7.43s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed48<34:54, 7.43s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed48<34:54, 7.43s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.5545, 'learning_rate': 1.988532110091743e-06, 'epoch': 2.83} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed48<34:54, 7.43s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed48<34:54, 7.43s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed48<34:54, 7.43s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed48<34:54, 7.43s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed48<34:54, 7.43s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed48<34:54, 7.43s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed48<34:54, 7.43s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed48<34:54, 7.43s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed48<34:54, 7.43s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed48<34:54, 7.43s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed48<34:54, 7.43s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed48<34:54, 7.43s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed48<34:54, 7.43s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed48<34:54, 7.43s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed48<34:54, 7.43s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed48<34:54, 7.43s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed48<34:54, 7.43s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed48<34:54, 7.43s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed48<34:54, 7.43s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed48<34:54, 7.43s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed48<34:54, 7.43s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed48<34:54, 7.43s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed48<34:54, 7.43s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed48<34:54, 7.43s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed48<34:54, 7.43s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed48<34:54, 7.43s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed48<34:54, 7.43s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed48<34:54, 7.43s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed48<34:54, 7.43s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed48<34:54, 7.43s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed48<34:54, 7.43s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed48<34:54, 7.43s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|███████████████████████████████████████████████████████████████████████▋ | 4587/4860 [18:14:47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.3073, 'learning_rate': 1.9197247706422016e-06, 'epoch': 2.83} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.1552, 'learning_rate': 1.9128440366972477e-06, 'epoch': 2.83} +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.7751, 'learning_rate': 1.9059633027522935e-06, 'epoch': 2.83} +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.3882, 'learning_rate': 1.8990825688073395e-06, 'epoch': 2.83} +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.1895, 'learning_rate': 1.878440366972477e-06, 'epoch': 2.84} +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.2617, 'learning_rate': 1.871559633027523e-06, 'epoch': 2.84} +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.2397, 'learning_rate': 1.864678899082569e-06, 'epoch': 2.84} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.046, 'learning_rate': 1.8577981651376147e-06, 'epoch': 2.84} +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.3836, 'learning_rate': 1.8509174311926608e-06, 'epoch': 2.84} +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.0547, 'learning_rate': 1.8440366972477066e-06, 'epoch': 2.84} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.2102, 'learning_rate': 1.8371559633027524e-06, 'epoch': 2.84} +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.1054, 'learning_rate': 1.8302752293577983e-06, 'epoch': 2.84} +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.0847, 'learning_rate': 1.823394495412844e-06, 'epoch': 2.84} +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.1168, 'learning_rate': 1.8165137614678901e-06, 'epoch': 2.84} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.0271, 'learning_rate': 1.8027522935779818e-06, 'epoch': 2.84} +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.0371, 'learning_rate': 1.7958715596330276e-06, 'epoch': 2.84} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.9469, 'learning_rate': 1.7821100917431195e-06, 'epoch': 2.84} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.8259, 'learning_rate': 1.7408256880733947e-06, 'epoch': 2.85} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.9596, 'learning_rate': 1.7339449541284405e-06, 'epoch': 2.85} +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.8657, 'learning_rate': 1.7270642201834864e-06, 'epoch': 2.85} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.899, 'learning_rate': 1.7201834862385322e-06, 'epoch': 2.85} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.9392, 'learning_rate': 1.713302752293578e-06, 'epoch': 2.85} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.971, 'learning_rate': 1.706422018348624e-06, 'epoch': 2.85} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.9314, 'learning_rate': 1.6926605504587157e-06, 'epoch': 2.85} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.8262, 'learning_rate': 1.6788990825688074e-06, 'epoch': 2.85} +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<28:19, 6.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 95%|████████████████████████████████████████████████████████████████████████▎ | 4627/4860 [18:21:08<29:20, 7.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 95%|████████████████████████████████████████████████████████████████████████▎ | 4627/4860 [18:21:08<29:20, 7.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.7122, 'learning_rate': 1.658256880733945e-06, 'epoch': 2.86} +Could not estimate the number of tokens of the input, floating-point operations will not be computed08<29:20, 7.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed08<29:20, 7.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed08<29:20, 7.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.5661, 'learning_rate': 1.651376146788991e-06, 'epoch': 2.86} +Could not estimate the number of tokens of the input, floating-point operations will not be computed08<29:20, 7.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed08<29:20, 7.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed08<29:20, 7.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed08<29:20, 7.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed08<29:20, 7.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed08<29:20, 7.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed08<29:20, 7.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed08<29:20, 7.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed08<29:20, 7.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed08<29:20, 7.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed08<29:20, 7.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed08<29:20, 7.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed08<29:20, 7.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed08<29:20, 7.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed08<29:20, 7.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed08<29:20, 7.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed08<29:20, 7.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed08<29:20, 7.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed08<29:20, 7.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed08<29:20, 7.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed08<29:20, 7.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed08<29:20, 7.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed08<29:20, 7.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed08<29:20, 7.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 95%|████████████████████████████████████████████████████████████████████████▍ | 4634/4860 [18:21:56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 95%|████████████████████████████████████████████████████████████████████████▍ | 4634/4860 [18:21:56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.28, 'learning_rate': 1.5756880733944955e-06, 'epoch': 2.86} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.535, 'learning_rate': 1.5550458715596332e-06, 'epoch': 2.87} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed56<25:35, 6.80s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|████████████████████████████████████████████████████████████████████████▋ | 4647/4860 [18:24:02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.1786, 'learning_rate': 1.5137614678899084e-06, 'epoch': 2.87} +Could not estimate the number of tokens of the input, floating-point operations will not be computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.2895, 'learning_rate': 1.5068807339449542e-06, 'epoch': 2.87} +Could not estimate the number of tokens of the input, floating-point operations will not be computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.1751, 'learning_rate': 1.5e-06, 'epoch': 2.87} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.0646, 'learning_rate': 1.493119266055046e-06, 'epoch': 2.87} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.1212, 'learning_rate': 1.486238532110092e-06, 'epoch': 2.87} +Could not estimate the number of tokens of the input, floating-point operations will not be computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.1289, 'learning_rate': 1.4793577981651377e-06, 'epoch': 2.87} +Could not estimate the number of tokens of the input, floating-point operations will not be computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.1273, 'learning_rate': 1.4724770642201836e-06, 'epoch': 2.87} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.0631, 'learning_rate': 1.4587155963302754e-06, 'epoch': 2.87} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.0093, 'learning_rate': 1.438073394495413e-06, 'epoch': 2.88} +Could not estimate the number of tokens of the input, floating-point operations will not be computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed02<41:44, 11.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|████████████████████████████████████████████████████████████████████████▉ | 4663/4860 [18:26:41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...60 [18:26:41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...60 [18:26:41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...60 [18:26:41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...60 [18:26:41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.9857, 'learning_rate': 1.3967889908256881e-06, 'epoch': 2.88} +Could not estimate the number of tokens of the input, floating-point operations will not be computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.0235, 'learning_rate': 1.389908256880734e-06, 'epoch': 2.88} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.9189, 'learning_rate': 1.3761467889908258e-06, 'epoch': 2.88} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.9059, 'learning_rate': 1.3692660550458717e-06, 'epoch': 2.88} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.8077, 'learning_rate': 1.3623853211009175e-06, 'epoch': 2.88} +Could not estimate the number of tokens of the input, floating-point operations will not be computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.8419, 'learning_rate': 1.3555045871559633e-06, 'epoch': 2.88} +Could not estimate the number of tokens of the input, floating-point operations will not be computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.8378, 'learning_rate': 1.3486238532110094e-06, 'epoch': 2.88} +Could not estimate the number of tokens of the input, floating-point operations will not be computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.8382, 'learning_rate': 1.334862385321101e-06, 'epoch': 2.89} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.7892, 'learning_rate': 1.3279816513761469e-06, 'epoch': 2.89} +Could not estimate the number of tokens of the input, floating-point operations will not be computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed41<30:24, 9.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|█████████████████████████████████████████████████████████████████████████▏ | 4680/4860 [18:28:54<21:25, 7.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|█████████████████████████████████████████████████████████████████████████▏ | 4680/4860 [18:28:54<21:25, 7.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed54<21:25, 7.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed54<21:25, 7.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed54<21:25, 7.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed54<21:25, 7.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed54<21:25, 7.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed54<21:25, 7.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed54<21:25, 7.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed54<21:25, 7.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed54<21:25, 7.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed54<21:25, 7.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed54<21:25, 7.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed54<21:25, 7.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed54<21:25, 7.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed54<21:25, 7.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed54<21:25, 7.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed54<21:25, 7.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed54<21:25, 7.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed54<21:25, 7.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed54<21:25, 7.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed54<21:25, 7.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed54<21:25, 7.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed54<21:25, 7.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed54<21:25, 7.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed54<21:25, 7.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed54<21:25, 7.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed54<21:25, 7.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed54<21:25, 7.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed54<21:25, 7.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed54<21:25, 7.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed54<21:25, 7.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed54<21:25, 7.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed54<21:25, 7.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed54<21:25, 7.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed54<21:25, 7.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed54<21:25, 7.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed54<21:25, 7.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|█████████████████████████████████████████████████████████████████████████▎ | 4689/4860 [18:29:50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.3723, 'learning_rate': 1.1972477064220185e-06, 'epoch': 2.9} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.2202, 'learning_rate': 1.1834862385321102e-06, 'epoch': 2.9} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.2981, 'learning_rate': 1.176605504587156e-06, 'epoch': 2.9} +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.0552, 'learning_rate': 1.169724770642202e-06, 'epoch': 2.9} +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.1438, 'learning_rate': 1.1628440366972479e-06, 'epoch': 2.9} +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.9216, 'learning_rate': 1.1559633027522937e-06, 'epoch': 2.9} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.1883, 'learning_rate': 1.1490825688073395e-06, 'epoch': 2.9} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.0252, 'learning_rate': 1.1422018348623853e-06, 'epoch': 2.9} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.122, 'learning_rate': 1.1353211009174314e-06, 'epoch': 2.9} +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<16:46, 5.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|█████████████████████████████████████████████████████████████████████████▌ | 4708/4860 [18:33:18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...60 [18:33:18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...60 [18:33:18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.1392, 'learning_rate': 1.0940366972477066e-06, 'epoch': 2.91} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.0523, 'learning_rate': 1.073394495412844e-06, 'epoch': 2.91} +Could not estimate the number of tokens of the input, floating-point operations will not be computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.0592, 'learning_rate': 1.059633027522936e-06, 'epoch': 2.91} +Could not estimate the number of tokens of the input, floating-point operations will not be computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.9755, 'learning_rate': 1.0527522935779818e-06, 'epoch': 2.91} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.9527, 'learning_rate': 1.0458715596330276e-06, 'epoch': 2.91} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.8312, 'learning_rate': 1.0389908256880734e-06, 'epoch': 2.91} +Could not estimate the number of tokens of the input, floating-point operations will not be computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.9744, 'learning_rate': 1.0321100917431193e-06, 'epoch': 2.91} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.9069, 'learning_rate': 1.0252293577981653e-06, 'epoch': 2.91} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.1377, 'learning_rate': 1.011467889908257e-06, 'epoch': 2.91} +Could not estimate the number of tokens of the input, floating-point operations will not be computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.7415, 'learning_rate': 9.770642201834863e-07, 'epoch': 2.92} +Could not estimate the number of tokens of the input, floating-point operations will not be computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.7839, 'learning_rate': 9.701834862385322e-07, 'epoch': 2.92} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed18<24:59, 9.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|█████████████████████████████████████████████████████████████████████████▉ | 4729/4860 [18:36:10<15:58, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed10<15:58, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10<15:58, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed10<15:58, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed10<15:58, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10<15:58, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed10<15:58, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10<15:58, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10<15:58, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed10<15:58, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10<15:58, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed10<15:58, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10<15:58, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10<15:58, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed10<15:58, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10<15:58, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed10<15:58, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed10<15:58, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10<15:58, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10<15:58, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed10<15:58, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed10<15:58, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10<15:58, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed10<15:58, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10<15:58, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10<15:58, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed10<15:58, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed10<15:58, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10<15:58, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10<15:58, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed10<15:58, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10<15:58, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10<15:58, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10<15:58, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed10<15:58, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10<15:58, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed10<15:58, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed10<15:58, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████ | 4738/4860 [18:37:07<12:21, 6.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed07<12:21, 6.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed07<12:21, 6.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed07<12:21, 6.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.2001, 'learning_rate': 8.876146788990827e-07, 'epoch': 2.93} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed07<12:21, 6.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed07<12:21, 6.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed07<12:21, 6.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed07<12:21, 6.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed07<12:21, 6.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed07<12:21, 6.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed07<12:21, 6.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed07<12:21, 6.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed07<12:21, 6.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed07<12:21, 6.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed07<12:21, 6.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed07<12:21, 6.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed07<12:21, 6.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed07<12:21, 6.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed07<12:21, 6.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed07<12:21, 6.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed07<12:21, 6.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed07<12:21, 6.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.6872, 'learning_rate': 8.669724770642203e-07, 'epoch': 2.93} +Could not estimate the number of tokens of the input, floating-point operations will not be computed07<12:21, 6.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed07<12:21, 6.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed07<12:21, 6.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed07<12:21, 6.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed07<12:21, 6.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed07<12:21, 6.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed07<12:21, 6.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.2574, 'learning_rate': 8.600917431192661e-07, 'epoch': 2.93} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed07<12:21, 6.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed07<12:21, 6.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed07<12:21, 6.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed07<12:21, 6.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed07<12:21, 6.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed07<12:21, 6.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|██████████████████████████████████████████████████████████████████████████▏ | 4744/4860 [18:38:12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...60 [18:38:12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...60 [18:38:12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.167, 'learning_rate': 8.463302752293579e-07, 'epoch': 2.93} +Could not estimate the number of tokens of the input, floating-point operations will not be computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.2791, 'learning_rate': 8.325688073394496e-07, 'epoch': 2.93} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.2147, 'learning_rate': 8.256880733944955e-07, 'epoch': 2.93} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.0082, 'learning_rate': 8.188073394495414e-07, 'epoch': 2.93} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.2273, 'learning_rate': 8.119266055045872e-07, 'epoch': 2.93} +Could not estimate the number of tokens of the input, floating-point operations will not be computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.1166, 'learning_rate': 8.050458715596331e-07, 'epoch': 2.93} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.1926, 'learning_rate': 7.98165137614679e-07, 'epoch': 2.93} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.9687, 'learning_rate': 7.912844036697248e-07, 'epoch': 2.93} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.1042, 'learning_rate': 7.844036697247707e-07, 'epoch': 2.93} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.9864, 'learning_rate': 7.775229357798166e-07, 'epoch': 2.94} +Could not estimate the number of tokens of the input, floating-point operations will not be computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.9464, 'learning_rate': 7.637614678899084e-07, 'epoch': 2.94} +Could not estimate the number of tokens of the input, floating-point operations will not be computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.9346, 'learning_rate': 7.362385321100918e-07, 'epoch': 2.94} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.0015, 'learning_rate': 7.224770642201836e-07, 'epoch': 2.94} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.885, 'learning_rate': 7.155963302752294e-07, 'epoch': 2.94} +Could not estimate the number of tokens of the input, floating-point operations will not be computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.9783, 'learning_rate': 7.087155963302753e-07, 'epoch': 2.94} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.928, 'learning_rate': 7.018348623853212e-07, 'epoch': 2.94} +Could not estimate the number of tokens of the input, floating-point operations will not be computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12<21:59, 11.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|██████████████████████████████████████████████████████████████████████████▌ | 4767/4860 [18:42:00<13:30, 8.71s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<13:30, 8.71s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<13:30, 8.71s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<13:30, 8.71s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<13:30, 8.71s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.857, 'learning_rate': 6.880733944954129e-07, 'epoch': 2.94} +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<13:30, 8.71s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<13:30, 8.71s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<13:30, 8.71s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<13:30, 8.71s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.8128, 'learning_rate': 6.811926605504587e-07, 'epoch': 2.94} +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<13:30, 8.71s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<13:30, 8.71s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<13:30, 8.71s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<13:30, 8.71s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.9225, 'learning_rate': 6.743119266055047e-07, 'epoch': 2.94} +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<13:30, 8.71s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<13:30, 8.71s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<13:30, 8.71s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<13:30, 8.71s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.8057, 'learning_rate': 6.674311926605505e-07, 'epoch': 2.94} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<13:30, 8.71s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<13:30, 8.71s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<13:30, 8.71s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<13:30, 8.71s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.9379, 'learning_rate': 6.605504587155963e-07, 'epoch': 2.95} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<13:30, 8.71s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<13:30, 8.71s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<13:30, 8.71s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<13:30, 8.71s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<13:30, 8.71s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<13:30, 8.71s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<13:30, 8.71s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<13:30, 8.71s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<13:30, 8.71s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<13:30, 8.71s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<13:30, 8.71s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<13:30, 8.71s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<13:30, 8.71s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<13:30, 8.71s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<13:30, 8.71s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<13:30, 8.71s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|██████████████████████████████████████████████████████████████████████████▋ | 4776/4860 [18:43:11<10:41, 7.64s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|██████████████████████████████████████████████████████████████████████████▋ | 4776/4860 [18:43:11<10:41, 7.64s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11<10:41, 7.64s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11<10:41, 7.64s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11<10:41, 7.64s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11<10:41, 7.64s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11<10:41, 7.64s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11<10:41, 7.64s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11<10:41, 7.64s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11<10:41, 7.64s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11<10:41, 7.64s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11<10:41, 7.64s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11<10:41, 7.64s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11<10:41, 7.64s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11<10:41, 7.64s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11<10:41, 7.64s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11<10:41, 7.64s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11<10:41, 7.64s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11<10:41, 7.64s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11<10:41, 7.64s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11<10:41, 7.64s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11<10:41, 7.64s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11<10:41, 7.64s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11<10:41, 7.64s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.4952, 'learning_rate': 5.986238532110092e-07, 'epoch': 2.95} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11<10:41, 7.64s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11<10:41, 7.64s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11<10:41, 7.64s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11<10:41, 7.64s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11<10:41, 7.64s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11<10:41, 7.64s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11<10:41, 7.64s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11<10:41, 7.64s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|██████████████████████████████████████████████████████████████████████████▊ | 4783/4860 [18:44:00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.7615, 'learning_rate': 5.29816513761468e-07, 'epoch': 2.96} +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.281, 'learning_rate': 5.160550458715596e-07, 'epoch': 2.96} +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.1928, 'learning_rate': 4.885321100917432e-07, 'epoch': 2.96} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.0905, 'learning_rate': 4.6100917431192665e-07, 'epoch': 2.96} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.0648, 'learning_rate': 4.5412844036697253e-07, 'epoch': 2.96} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.9818, 'learning_rate': 4.4724770642201836e-07, 'epoch': 2.96} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.1928, 'learning_rate': 4.4036697247706425e-07, 'epoch': 2.97} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.0196, 'learning_rate': 4.3348623853211013e-07, 'epoch': 2.97} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.026, 'learning_rate': 4.1972477064220185e-07, 'epoch': 2.97} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.0409, 'learning_rate': 4.059633027522936e-07, 'epoch': 2.97} +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.0851, 'learning_rate': 3.990825688073395e-07, 'epoch': 2.97} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.0059, 'learning_rate': 3.853211009174312e-07, 'epoch': 2.97} +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.8247, 'learning_rate': 3.71559633027523e-07, 'epoch': 2.97} +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.9393, 'learning_rate': 3.509174311926606e-07, 'epoch': 2.97} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.9148, 'learning_rate': 3.3715596330275234e-07, 'epoch': 2.97} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.9247, 'learning_rate': 3.3027522935779817e-07, 'epoch': 2.98} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.9841, 'learning_rate': 3.2339449541284406e-07, 'epoch': 2.98} +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.828, 'learning_rate': 2.9587155963302754e-07, 'epoch': 2.98} +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<08:57, 6.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 99%|███████████████████████████████████████████████████████████████████████████▍| 4828/4860 [18:51:00<03:59, 7.47s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 99%|███████████████████████████████████████████████████████████████████████████▍| 4828/4860 [18:51:00<03:59, 7.47s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.6235, 'learning_rate': 2.7522935779816514e-07, 'epoch': 2.98} +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<03:59, 7.47s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<03:59, 7.47s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<03:59, 7.47s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<03:59, 7.47s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<03:59, 7.47s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<03:59, 7.47s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<03:59, 7.47s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<03:59, 7.47s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<03:59, 7.47s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<03:59, 7.47s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<03:59, 7.47s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<03:59, 7.47s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<03:59, 7.47s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<03:59, 7.47s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<03:59, 7.47s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<03:59, 7.47s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<03:59, 7.47s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<03:59, 7.47s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<03:59, 7.47s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<03:59, 7.47s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<03:59, 7.47s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<03:59, 7.47s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<03:59, 7.47s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<03:59, 7.47s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<03:59, 7.47s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.5441, 'learning_rate': 2.3394495412844038e-07, 'epoch': 2.98} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed00<03:59, 7.47s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed00<03:59, 7.47s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 99%|███████████████████████████████████████████████████████████████████████████▌| 4835/4860 [18:51:47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 99%|███████████████████████████████████████████████████████████████████████████▌| 4835/4860 [18:51:47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.1218, 'learning_rate': 1.926605504587156e-07, 'epoch': 2.99} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.0899, 'learning_rate': 1.6513761467889909e-07, 'epoch': 2.99} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.1755, 'learning_rate': 1.5825688073394497e-07, 'epoch': 2.99} +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 2.1044, 'learning_rate': 1.5137614678899083e-07, 'epoch': 2.99} +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed47<02:46, 6.67s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +100%|███████████████████████████████████████████████████████████████████████████▊| 4848/4860 [18:53:45<02:00, 10.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed45<02:00, 10.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed45<02:00, 10.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed45<02:00, 10.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed45<02:00, 10.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed45<02:00, 10.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed45<02:00, 10.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed45<02:00, 10.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed45<02:00, 10.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed45<02:00, 10.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed45<02:00, 10.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.9465, 'learning_rate': 1.2385321100917433e-07, 'epoch': 2.99} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed45<02:00, 10.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed45<02:00, 10.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed45<02:00, 10.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed45<02:00, 10.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +100%|███████████████████████████████████████████████████████████████████████████▊| 4851/4860 [18:54:12<01:22, 9.20s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...60 [18:54:12<01:22, 9.20s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...60 [18:54:12<01:22, 9.20s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...60 [18:54:12<01:22, 9.20s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...60 [18:54:12<01:22, 9.20s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.9055, 'learning_rate': 1.1009174311926606e-07, 'epoch': 2.99} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...60 [18:54:12<01:22, 9.20s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12<01:22, 9.20s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12<01:22, 9.20s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12<01:22, 9.20s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +100%|███████████████████████████████████████████████████████████████████████████▉| 4853/4860 [18:54:28<00:59, 8.53s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...60 [18:54:28<00:59, 8.53s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed28<00:59, 8.53s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed28<00:59, 8.53s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed28<00:59, 8.53s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed28<00:59, 8.53s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed28<00:59, 8.53s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed28<00:59, 8.53s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed28<00:59, 8.53s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed28<00:59, 8.53s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed28<00:59, 8.53s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed28<00:59, 8.53s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +100%|███████████████████████████████████████████████████████████████████████████▉| 4856/4860 [18:54:50<00:31, 7.78s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +100%|███████████████████████████████████████████████████████████████████████████▉| 4856/4860 [18:54:50<00:31, 7.78s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.5827, 'learning_rate': 8.256880733944954e-08, 'epoch': 3.0} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed50<00:31, 7.78s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<00:31, 7.78s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<00:31, 7.78s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed50<00:31, 7.78s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<00:31, 7.78s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed50<00:31, 7.78s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<00:31, 7.78s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<00:31, 7.78s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed50<00:31, 7.78s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed50<00:31, 7.78s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<00:31, 7.78s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed50<00:31, 7.78s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed50<00:31, 7.78s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed50<00:31, 7.78s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed50<00:31, 7.78s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 1.1552, 'learning_rate': 5.504587155963303e-08, 'epoch': 3.0} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed50<00:31, 7.78s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Saving model checkpoint to ./ble with gradient checkpointing. Setting `use_cache=False`...e computed50<00:31, 7.78s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Saving model checkpoint to ./ble with gradient checkpointing. Setting `use_cache=False`...e computed50<00:31, 7.78s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Saving model checkpoint to ./ble with gradient checkpointing. Setting `use_cache=False`...e computed50<00:31, 7.78s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message.