diff --git "a/wandb/run-20220503_171959-a6039xud/files/output.log" "b/wandb/run-20220503_171959-a6039xud/files/output.log" --- "a/wandb/run-20220503_171959-a6039xud/files/output.log" +++ "b/wandb/run-20220503_171959-a6039xud/files/output.log" @@ -23685,5 +23685,3284 @@ To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... +To disable this warning, you can either: + - Avoid using `tokenizers` before the fork if possible + - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) +huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... +To disable this warning, you can either: + - Avoid using `tokenizers` before the fork if possible + - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) +huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... +To disable this warning, you can either: + - Avoid using `tokenizers` before the fork if possible + - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) +huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... +To disable this warning, you can either: + - Avoid using `tokenizers` before the fork if possible + - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) +huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... +To disable this warning, you can either: + - Avoid using `tokenizers` before the fork if possible + - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) +huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... +To disable this warning, you can either: + - Avoid using `tokenizers` before the fork if possible + - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) +huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... +To disable this warning, you can either: + - Avoid using `tokenizers` before the fork if possible +Could not estimate the number of tokens of the input, floating-point operations will not be computed The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 7.4047, 'learning_rate': 0.0004081968124629681, 'epoch': 0.54} +Could not estimate the number of tokens of the input, floating-point operations will not be computed The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 7.3667, 'learning_rate': 0.0004081712153718248, 'epoch': 0.54} +Could not estimate the number of tokens of the input, floating-point operations will not be computed The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 7.1638, 'learning_rate': 0.0004081456182806814, 'epoch': 0.54} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 7.0257, 'learning_rate': 0.00040812002118953807, 'epoch': 0.54} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 7.1283, 'learning_rate': 0.0004080944240983947, 'epoch': 0.54} +Could not estimate the number of tokens of the input, floating-point operations will not be computed The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 7.0839, 'learning_rate': 0.00040806882700725135, 'epoch': 0.54} +Could not estimate the number of tokens of the input, floating-point operations will not be computed The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.9073, 'learning_rate': 0.000408043229916108, 'epoch': 0.54} +Could not estimate the number of tokens of the input, floating-point operations will not be computed The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.761, 'learning_rate': 0.0004080176328249647, 'epoch': 0.54} +Could not estimate the number of tokens of the input, floating-point operations will not be computed The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 18%|████████████▊ | 3509/19440 [10:37:30<314:48:56, 71.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:48:56, 71.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:48:56, 71.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 18%|████████████▊ | 3510/19440 [10:37:34<225:33:03, 50.97s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 18%|████████████▊ | 3510/19440 [10:37:34<225:33:03, 50.97s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...10:37:34<225:33:03, 50.97s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed25:33:03, 50.97s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed25:33:03, 50.97s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.6147, 'learning_rate': 0.00040791524446039123, 'epoch': 0.54} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed25:33:03, 50.97s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed25:33:03, 50.97s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.6143, 'learning_rate': 0.0004078896473692479, 'epoch': 0.54} +Could not estimate the number of tokens of the input, floating-point operations will not be computed25:33:03, 50.97s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed25:33:03, 50.97s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed25:33:03, 50.97s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed25:33:03, 50.97s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed25:33:03, 50.97s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed25:33:03, 50.97s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed25:33:03, 50.97s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed25:33:03, 50.97s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.7258, 'learning_rate': 0.00040781285609581784, 'epoch': 0.54} +Could not estimate the number of tokens of the input, floating-point operations will not be computed25:33:03, 50.97s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed25:33:03, 50.97s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed25:33:03, 50.97s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed25:33:03, 50.97s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed25:33:03, 50.97s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed25:33:03, 50.97s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed25:33:03, 50.97s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed25:33:03, 50.97s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed25:33:03, 50.97s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed25:33:03, 50.97s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed25:33:03, 50.97s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed25:33:03, 50.97s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed25:33:03, 50.97s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed25:33:03, 50.97s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed25:33:03, 50.97s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed25:33:03, 50.97s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed25:33:03, 50.97s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed25:33:03, 50.97s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed25:33:03, 50.97s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed25:33:03, 50.97s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed25:33:03, 50.97s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed25:33:03, 50.97s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed25:33:03, 50.97s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed25:33:03, 50.97s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed25:33:03, 50.97s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed25:33:03, 50.97s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed25:33:03, 50.97s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 18%|█████████████ | 3526/19440 [10:38:33<16:20:15, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 18%|█████████████ | 3526/19440 [10:38:33<16:20:15, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:20:15, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:20:15, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:20:15, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.3637, 'learning_rate': 0.0004075056910020977, 'epoch': 0.54} +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:20:15, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:20:15, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:20:15, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:20:15, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:20:15, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.4459, 'learning_rate': 0.000407454496819811, 'epoch': 0.54} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:20:15, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:20:15, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:20:15, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:20:15, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:20:15, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:20:15, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:20:15, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:20:15, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:20:15, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:20:15, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:20:15, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 18%|█████████████ | 3535/19440 [10:39:01<13:12:42, 2.99s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 18%|█████████████ | 3535/19440 [10:39:01<13:12:42, 2.99s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed13:12:42, 2.99s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed13:12:42, 2.99s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed13:12:42, 2.99s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed13:12:42, 2.99s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed13:12:42, 2.99s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 5.866, 'learning_rate': 0.0004072497200906642, 'epoch': 0.55} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed13:12:42, 2.99s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed13:12:42, 2.99s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed13:12:42, 2.99s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed13:12:42, 2.99s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 5.8022, 'learning_rate': 0.00040719852590837756, 'epoch': 0.55} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed13:12:42, 2.99s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 5.5195, 'learning_rate': 0.00040717292881723414, 'epoch': 0.55} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed13:12:42, 2.99s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed13:12:42, 2.99s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 5.7655, 'learning_rate': 0.00040714733172609083, 'epoch': 0.55} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed13:12:42, 2.99s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 5.9768, 'learning_rate': 0.0004071217346349475, 'epoch': 0.55} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed13:12:42, 2.99s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 5.4652, 'learning_rate': 0.0004070961375438041, 'epoch': 0.55} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed13:12:42, 2.99s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed13:12:42, 2.99s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed13:12:42, 2.99s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed13:12:42, 2.99s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 18%|█████████████▏ | 3546/19440 [10:39:31<11:37:46, 2.63s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 18%|█████████████▏ | 3546/19440 [10:39:31<11:37:46, 2.63s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:37:46, 2.63s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:37:46, 2.63s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:37:46, 2.63s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:37:46, 2.63s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:37:46, 2.63s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:37:46, 2.63s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:37:46, 2.63s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 18%|█████████████▏ | 3550/19440 [10:39:41<11:18:45, 2.56s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:18:45, 2.56s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:18:45, 2.56s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:18:45, 2.56s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:18:45, 2.56s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:18:45, 2.56s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 7.4454, 'learning_rate': 0.00040689136081465733, 'epoch': 0.55} +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:18:45, 2.56s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:18:45, 2.56s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 7.1622, 'learning_rate': 0.000406865763723514, 'epoch': 0.55} +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:18:45, 2.56s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:18:45, 2.56s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:18:45, 2.56s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:18:45, 2.56s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:18:45, 2.56s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.947, 'learning_rate': 0.0004068145695412273, 'epoch': 0.55} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:18:45, 2.56s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:18:45, 2.56s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:18:45, 2.56s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:18:45, 2.56s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:18:45, 2.56s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 18%|█████████████▏ | 3557/19440 [10:40:10<16:52:36, 3.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[10:40:10<16:52:36, 3.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[10:40:10<16:52:36, 3.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:52:36, 3.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:52:36, 3.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:52:36, 3.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:52:36, 3.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:52:36, 3.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:52:36, 3.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:52:36, 3.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:52:36, 3.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:52:36, 3.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:52:36, 3.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:52:36, 3.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.6287, 'learning_rate': 0.0004066353899032239, 'epoch': 0.55} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:52:36, 3.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:52:36, 3.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:52:36, 3.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:52:36, 3.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:52:36, 3.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:52:36, 3.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:52:36, 3.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.6661, 'learning_rate': 0.00040655859862979385, 'epoch': 0.55} +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:52:36, 3.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:52:36, 3.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.5517, 'learning_rate': 0.0004065330015386505, 'epoch': 0.55} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:52:36, 3.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:52:36, 3.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.4914, 'learning_rate': 0.00040650740444750713, 'epoch': 0.55} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:52:36, 3.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:52:36, 3.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:52:36, 3.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:52:36, 3.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.5268, 'learning_rate': 0.00040645621026522046, 'epoch': 0.55} +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:52:36, 3.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.6756, 'learning_rate': 0.0004064306131740771, 'epoch': 0.55} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:52:36, 3.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:52:36, 3.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:52:36, 3.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:52:36, 3.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:52:36, 3.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:52:36, 3.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:52:36, 3.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.457, 'learning_rate': 0.000406353821900647, 'epoch': 0.55} +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:52:36, 3.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:52:36, 3.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.4262, 'learning_rate': 0.0004063282248095037, 'epoch': 0.55} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:52:36, 3.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:52:36, 3.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:52:36, 3.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:52:36, 3.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:52:36, 3.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:52:36, 3.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:52:36, 3.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 18%|█████████████▏ | 3577/19440 [10:41:18<14:12:37, 3.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 18%|█████████████▏ | 3577/19440 [10:41:18<14:12:37, 3.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[10:41:18<14:12:37, 3.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:12:37, 3.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:12:37, 3.22s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 18%|█████████████▎ | 3579/19440 [10:41:24<13:47:57, 3.13s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 18%|███████���█████▎ | 3579/19440 [10:41:24<13:47:57, 3.13s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[10:41:24<13:47:57, 3.13s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed13:47:57, 3.13s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed13:47:57, 3.13s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 18%|█████████████▎ | 3581/19440 [10:41:30<13:28:31, 3.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 18%|█████████████▎ | 3581/19440 [10:41:30<13:28:31, 3.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed13:28:31, 3.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed13:28:31, 3.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed13:28:31, 3.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed13:28:31, 3.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed13:28:31, 3.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed13:28:31, 3.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed13:28:31, 3.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.0459, 'learning_rate': 0.0004060466568069269, 'epoch': 0.55} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed13:28:31, 3.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed13:28:31, 3.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.0527, 'learning_rate': 0.0004060210597157835, 'epoch': 0.55} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed13:28:31, 3.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 5.9581, 'learning_rate': 0.0004059954626246402, 'epoch': 0.55} +Could not estimate the number of tokens of the input, floating-point operations will not be computed13:28:31, 3.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed13:28:31, 3.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed13:28:31, 3.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed13:28:31, 3.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.0802, 'learning_rate': 0.00040594426844235346, 'epoch': 0.55} +Could not estimate the number of tokens of the input, floating-point operations will not be computed13:28:31, 3.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed13:28:31, 3.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 5.8121, 'learning_rate': 0.00040591867135121015, 'epoch': 0.55} +Could not estimate the number of tokens of the input, floating-point operations will not be computed13:28:31, 3.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 5.6238, 'learning_rate': 0.0004058930742600668, 'epoch': 0.55} +Could not estimate the number of tokens of the input, floating-point operations will not be computed13:28:31, 3.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 5.6679, 'learning_rate': 0.00040586747716892343, 'epoch': 0.55} +Could not estimate the number of tokens of the input, floating-point operations will not be computed13:28:31, 3.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed13:28:31, 3.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 5.8987, 'learning_rate': 0.00040584188007778007, 'epoch': 0.55} +Could not estimate the number of tokens of the input, floating-point operations will not be computed13:28:31, 3.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed13:28:31, 3.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed13:28:31, 3.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 5.4283, 'learning_rate': 0.00040579068589549335, 'epoch': 0.55} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed13:28:31, 3.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed13:28:31, 3.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed13:28:31, 3.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed13:28:31, 3.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 19%|█████████████▎ | 3597/19440 [10:42:14<11:22:52, 2.59s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 19%|█████████████▎ | 3597/19440 [10:42:14<11:22:52, 2.59s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[10:42:14<11:22:52, 2.59s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[10:42:14<11:22:52, 2.59s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:22:52, 2.59s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:22:52, 2.59s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:22:52, 2.59s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:22:52, 2.59s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:22:52, 2.59s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:22:52, 2.59s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 7.2955, 'learning_rate': 0.00040563710334863334, 'epoch': 0.56} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:22:52, 2.59s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:22:52, 2.59s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:22:52, 2.59s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:22:52, 2.59s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:22:52, 2.59s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 7.2537, 'learning_rate': 0.0004055859091663466, 'epoch': 0.56} +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:22:52, 2.59s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:22:52, 2.59s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 19%|█████████████▎ | 3604/19440 [10:42:39<16:31:54, 3.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[10:42:39<16:31:54, 3.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[10:42:39<16:31:54, 3.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 19%|█████████████▎ | 3605/19440 [10:42:43<16:39:50, 3.79s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[10:42:43<16:39:50, 3.79s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[10:42:43<16:39:50, 3.79s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:39:50, 3.79s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:39:50, 3.79s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:39:50, 3.79s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.6272, 'learning_rate': 0.00040548352080177323, 'epoch': 0.56} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:39:50, 3.79s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:39:50, 3.79s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:39:50, 3.79s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:39:50, 3.79s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:39:50, 3.79s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:39:50, 3.79s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:39:50, 3.79s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:39:50, 3.79s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:39:50, 3.79s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:39:50, 3.79s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 19%|█████████████▎ | 3611/19440 [10:43:05<16:04:11, 3.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:04:11, 3.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:04:11, 3.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:04:11, 3.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:04:11, 3.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:04:11, 3.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:04:11, 3.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:04:11, 3.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:04:11, 3.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:04:11, 3.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:04:11, 3.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:04:11, 3.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:04:11, 3.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:04:11, 3.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:04:11, 3.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:04:11, 3.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:04:11, 3.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:04:11, 3.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:04:11, 3.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 19%|█████████████▍ | 3618/19440 [10:43:29<14:57:30, 3.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[10:43:29<14:57:30, 3.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[10:43:29<14:57:30, 3.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:57:30, 3.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:57:30, 3.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:57:30, 3.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:57:30, 3.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:57:30, 3.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:57:30, 3.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:57:30, 3.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:57:30, 3.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.5658, 'learning_rate': 0.00040509956443462303, 'epoch': 0.56} +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:57:30, 3.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:57:30, 3.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:57:30, 3.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:57:30, 3.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.4588, 'learning_rate': 0.0004050483702523363, 'epoch': 0.56} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:57:30, 3.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.3709, 'learning_rate': 0.00040502277316119295, 'epoch': 0.56} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:57:30, 3.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:57:30, 3.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:57:30, 3.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:57:30, 3.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.4093, 'learning_rate': 0.0004049715789789062, 'epoch': 0.56} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:57:30, 3.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:57:30, 3.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.4907, 'learning_rate': 0.0004049459818877629, 'epoch': 0.56} +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:57:30, 3.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:57:30, 3.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:57:30, 3.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:57:30, 3.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:57:30, 3.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:57:30, 3.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:57:30, 3.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.1539, 'learning_rate': 0.0004048691906143329, 'epoch': 0.56} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:57:30, 3.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 19%|█████████████▍ | 3632/19440 [10:44:13<13:19:46, 3.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed13:19:46, 3.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed13:19:46, 3.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 5.9389, 'learning_rate': 0.0004048179964320462, 'epoch': 0.56} +Could not estimate the number of tokens of the input, floating-point operations will not be computed13:19:46, 3.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 5.8883, 'learning_rate': 0.0004047923993409028, 'epoch': 0.56} +Could not estimate the number of tokens of the input, floating-point operations will not be computed13:19:46, 3.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed13:19:46, 3.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed13:19:46, 3.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed13:19:46, 3.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed13:19:46, 3.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed13:19:46, 3.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 5.7812, 'learning_rate': 0.0004047156080674728, 'epoch': 0.56} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed13:19:46, 3.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed13:19:46, 3.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 5.8436, 'learning_rate': 0.0004046900109763294, 'epoch': 0.56} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed13:19:46, 3.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 5.947, 'learning_rate': 0.0004046644138851861, 'epoch': 0.56} +Could not estimate the number of tokens of the input, floating-point operations will not be computed13:19:46, 3.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed13:19:46, 3.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 5.7194, 'learning_rate': 0.00040463881679404275, 'epoch': 0.56} +Could not estimate the number of tokens of the input, floating-point operations will not be computed13:19:46, 3.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 5.6086, 'learning_rate': 0.0004046132197028994, 'epoch': 0.56} +Could not estimate the number of tokens of the input, floating-point operations will not be computed13:19:46, 3.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 5.6833, 'learning_rate': 0.0004045876226117561, 'epoch': 0.56} +Could not estimate the number of tokens of the input, floating-point operations will not be computed13:19:46, 3.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed13:19:46, 3.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 5.557, 'learning_rate': 0.00040456202552061266, 'epoch': 0.56} +Could not estimate the number of tokens of the input, floating-point operations will not be computed13:19:46, 3.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed13:19:46, 3.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed13:19:46, 3.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 5.2888, 'learning_rate': 0.000404510831338326, 'epoch': 0.56} +Could not estimate the number of tokens of the input, floating-point operations will not be computed13:19:46, 3.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed13:19:46, 3.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed13:19:46, 3.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed13:19:46, 3.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 4.9377, 'learning_rate': 0.00040445963715603927, 'epoch': 0.56} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed13:19:46, 3.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 4.8096, 'learning_rate': 0.00040443404006489596, 'epoch': 0.56} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed13:19:46, 3.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 19%|█████████████▌ | 3649/19440 [10:45:00<11:01:23, 2.51s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:01:23, 2.51s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:01:23, 2.51s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 4.7124, 'learning_rate': 0.00040438284588260924, 'epoch': 0.56} +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:01:23, 2.51s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:01:23, 2.51s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 7.6476, 'learning_rate': 0.00040435724879146594, 'epoch': 0.56} +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:01:23, 2.51s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:01:23, 2.51s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 7.5323, 'learning_rate': 0.0004043316517003226, 'epoch': 0.56} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:01:23, 2.51s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:01:23, 2.51s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 19%|█████████████▌ | 3653/19440 [10:45:16<16:12:00, 3.69s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[10:45:16<16:12:00, 3.69s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[10:45:16<16:12:00, 3.69s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:12:00, 3.69s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:12:00, 3.69s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:12:00, 3.69s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:12:00, 3.69s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:12:00, 3.69s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:12:00, 3.69s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 7.1695, 'learning_rate': 0.00040422926333574913, 'epoch': 0.56} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:12:00, 3.69s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:12:00, 3.69s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:12:00, 3.69s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:12:00, 3.69s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:12:00, 3.69s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:12:00, 3.69s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:12:00, 3.69s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.7818, 'learning_rate': 0.0004041524720623191, 'epoch': 0.56} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:12:00, 3.69s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:12:00, 3.69s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 19%|█████████████▌ | 3660/19440 [10:45:42<16:14:04, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:14:04, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:14:04, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:14:04, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:14:04, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:14:04, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:14:04, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:14:04, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.6184, 'learning_rate': 0.0004040500836977457, 'epoch': 0.57} +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:14:04, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:14:04, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:14:04, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:14:04, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:14:04, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:14:04, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:14:04, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:14:04, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:14:04, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:14:04, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.4738, 'learning_rate': 0.0004039476953331724, 'epoch': 0.57} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:14:04, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:14:04, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:14:04, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:14:04, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:14:04, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:14:04, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:14:04, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 19%|█████████████▌ | 3670/19440 [10:46:17<14:38:26, 3.34s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:38:26, 3.34s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:38:26, 3.34s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.41, 'learning_rate': 0.000403845306968599, 'epoch': 0.57} +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:38:26, 3.34s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:38:26, 3.34s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:38:26, 3.34s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:38:26, 3.34s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 19%|█████████████▌ | 3673/19440 [10:46:26<14:13:27, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 19%|█████████████▌ | 3673/19440 [10:46:26<14:13:27, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[10:46:26<14:13:27, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[10:46:26<14:13:27, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[10:46:26<14:13:27, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:13:27, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:13:27, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:13:27, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:13:27, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:13:27, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:13:27, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:13:27, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:13:27, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:13:27, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:13:27, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:13:27, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:13:27, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:13:27, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.3466, 'learning_rate': 0.00040361493314830884, 'epoch': 0.57} +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:13:27, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:13:27, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:13:27, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:13:27, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:13:27, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:13:27, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.1344, 'learning_rate': 0.0004035381418748788, 'epoch': 0.57} +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:13:27, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:13:27, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:13:27, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:13:27, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 19%|█████████████▋ | 3685/19440 [10:47:03<12:52:32, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12:52:32, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12:52:32, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12:52:32, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12:52:32, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12:52:32, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12:52:32, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12:52:32, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12:52:32, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12:52:32, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 5.7937, 'learning_rate': 0.0004033845593280187, 'epoch': 0.57} +Could not estimate the number of tokens of the input, floating-point operations will not be computed12:52:32, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12:52:32, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 5.809, 'learning_rate': 0.00040335896223687534, 'epoch': 0.57} +Could not estimate the number of tokens of the input, floating-point operations will not be computed12:52:32, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12:52:32, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12:52:32, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12:52:32, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12:52:32, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 5.4284, 'learning_rate': 0.0004032821709634453, 'epoch': 0.57} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12:52:32, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12:52:32, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 5.4371, 'learning_rate': 0.00040325657387230195, 'epoch': 0.57} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12:52:32, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12:52:32, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12:52:32, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 5.1968, 'learning_rate': 0.0004032053796900153, 'epoch': 0.57} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12:52:32, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12:52:32, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 5.1743, 'learning_rate': 0.0004031797825988719, 'epoch': 0.57} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12:52:32, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 5.102, 'learning_rate': 0.00040315418550772856, 'epoch': 0.57} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12:52:32, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12:52:32, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12:52:32, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 4.4612, 'learning_rate': 0.0004031029913254419, 'epoch': 0.57} +Could not estimate the number of tokens of the input, floating-point operations will not be computed12:52:32, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12:52:32, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12:52:32, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 7.4476, 'learning_rate': 0.00040307739423429853, 'epoch': 0.57} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12:52:32, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12:52:32, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12:52:32, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12:52:32, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12:52:32, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.9634, 'learning_rate': 0.00040302620005201186, 'epoch': 0.57} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12:52:32, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12:52:32, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 7.1751, 'learning_rate': 0.00040300060296086845, 'epoch': 0.57} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12:52:32, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12:52:32, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12:52:32, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12:52:32, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.8215, 'learning_rate': 0.0004029494087785818, 'epoch': 0.57} +Could not estimate the number of tokens of the input, floating-point operations will not be computed12:52:32, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12:52:32, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12:52:32, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12:52:32, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12:52:32, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 19%|█████████████▋ | 3708/19440 [10:48:15<16:26:41, 3.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:26:41, 3.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:26:41, 3.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.8952, 'learning_rate': 0.00040287261750515175, 'epoch': 0.57} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:26:41, 3.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:26:41, 3.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:26:41, 3.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:26:41, 3.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:26:41, 3.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.7188, 'learning_rate': 0.00040282142332286503, 'epoch': 0.57} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:26:41, 3.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:26:41, 3.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:26:41, 3.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:26:41, 3.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 19%|███████��█████▊ | 3713/19440 [10:48:34<16:11:09, 3.71s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[10:48:34<16:11:09, 3.71s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[10:48:34<16:11:09, 3.71s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.682, 'learning_rate': 0.000402744632049435, 'epoch': 0.57} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[10:48:34<16:11:09, 3.71s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[10:48:34<16:11:09, 3.71s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:11:09, 3.71s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:11:09, 3.71s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:11:09, 3.71s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:11:09, 3.71s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:11:09, 3.71s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.5296, 'learning_rate': 0.00040266784077600497, 'epoch': 0.57} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:11:09, 3.71s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:11:09, 3.71s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:11:09, 3.71s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:11:09, 3.71s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 19%|█████████████▊ | 3719/19440 [10:48:54<14:48:13, 3.39s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:48:13, 3.39s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:48:13, 3.39s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.4369, 'learning_rate': 0.0004025910495025749, 'epoch': 0.57} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:48:13, 3.39s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:48:13, 3.39s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.1943, 'learning_rate': 0.0004025654524114316, 'epoch': 0.57} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:48:13, 3.39s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.2071, 'learning_rate': 0.0004025398553202882, 'epoch': 0.57} +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:48:13, 3.39s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:48:13, 3.39s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.3593, 'learning_rate': 0.00040251425822914486, 'epoch': 0.57} +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:48:13, 3.39s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.4095, 'learning_rate': 0.0004024886611380015, 'epoch': 0.57} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:48:13, 3.39s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:48:13, 3.39s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:48:13, 3.39s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:48:13, 3.39s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:48:13, 3.39s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.4348, 'learning_rate': 0.00040243746695571483, 'epoch': 0.57} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:48:13, 3.39s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:48:13, 3.39s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:48:13, 3.39s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:48:13, 3.39s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:48:13, 3.39s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:48:13, 3.39s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:48:13, 3.39s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:48:13, 3.39s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:48:13, 3.39s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:48:13, 3.39s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:48:13, 3.39s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.1581, 'learning_rate': 0.0004023094814999981, 'epoch': 0.58} + 19%|█████████████▊ | 3732/19440 [10:49:34<13:01:59, 2.99s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 19%|█████████████▊ | 3732/19440 [10:49:34<13:01:59, 2.99s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed13:01:59, 2.99s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed13:01:59, 2.99s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed13:01:59, 2.99s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed13:01:59, 2.99s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.0797, 'learning_rate': 0.00040223269022656805, 'epoch': 0.58} +Could not estimate the number of tokens of the input, floating-point operations will not be computed13:01:59, 2.99s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed13:01:59, 2.99s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.1245, 'learning_rate': 0.00040220709313542474, 'epoch': 0.58} +Could not estimate the number of tokens of the input, floating-point operations will not be computed13:01:59, 2.99s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed13:01:59, 2.99s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 19%|█████████████▊ | 3737/19440 [10:49:48<12:26:02, 2.85s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 19%|█████████████▊ | 3737/19440 [10:49:48<12:26:02, 2.85s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12:26:02, 2.85s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12:26:02, 2.85s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12:26:02, 2.85s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 19%|█████████████▊ | 3739/19440 [10:49:54<12:41:22, 2.91s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 19%|█████████████▊ | 3739/19440 [10:49:54<12:41:22, 2.91s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12:41:22, 2.91s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12:41:22, 2.91s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 5.7996, 'learning_rate': 0.00040207910767970793, 'epoch': 0.58} +Could not estimate the number of tokens of the input, floating-point operations will not be computed12:41:22, 2.91s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 5.7537, 'learning_rate': 0.00040205351058856463, 'epoch': 0.58} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12:41:22, 2.91s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12:41:22, 2.91s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 5.4465, 'learning_rate': 0.0004020279134974212, 'epoch': 0.58} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12:41:22, 2.91s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 5.3708, 'learning_rate': 0.0004020023164062779, 'epoch': 0.58} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12:41:22, 2.91s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 5.445, 'learning_rate': 0.0004019767193151346, 'epoch': 0.58} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12:41:22, 2.91s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 19%|█████████████▊ | 3745/19440 [10:50:10<11:27:33, 2.63s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:27:33, 2.63s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:27:33, 2.63s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:27:33, 2.63s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:27:33, 2.63s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:27:33, 2.63s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:27:33, 2.63s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:27:33, 2.63s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:27:33, 2.63s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 4.8466, 'learning_rate': 0.0004018487338594178, 'epoch': 0.58} + 19%|█████████████▉ | 3750/19440 [10:50:23<10:59:51, 2.52s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 19%|█████████████▉ | 3750/19440 [10:50:23<10:59:51, 2.52s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed10:59:51, 2.52s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed10:59:51, 2.52s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed10:59:51, 2.52s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10:59:51, 2.52s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10:59:51, 2.52s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10:59:51, 2.52s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed10:59:51, 2.52s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10:59:51, 2.52s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10:59:51, 2.52s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed10:59:51, 2.52s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10:59:51, 2.52s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10:59:51, 2.52s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 7.3388, 'learning_rate': 0.0004017207484037011, 'epoch': 0.58} +Could not estimate the number of tokens of the input, floating-point operations will not be computed10:59:51, 2.52s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed10:59:51, 2.52s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 7.0418, 'learning_rate': 0.00040169515131255773, 'epoch': 0.58} +Could not estimate the number of tokens of the input, floating-point operations will not be computed10:59:51, 2.52s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10:59:51, 2.52s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10:59:51, 2.52s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10:59:51, 2.52s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed10:59:51, 2.52s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10:59:51, 2.52s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed10:59:51, 2.52s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed10:59:51, 2.52s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10:59:51, 2.52s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10:59:51, 2.52s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.8135, 'learning_rate': 0.00040159276294798434, 'epoch': 0.58} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10:59:51, 2.52s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10:59:51, 2.52s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed10:59:51, 2.52s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10:59:51, 2.52s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10:59:51, 2.52s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.8046, 'learning_rate': 0.0004015415687656977, 'epoch': 0.58} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10:59:51, 2.52s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed10:59:51, 2.52s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed10:59:51, 2.52s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10:59:51, 2.52s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10:59:51, 2.52s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10:59:51, 2.52s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.7546, 'learning_rate': 0.00040146477749226765, 'epoch': 0.58} +Could not estimate the number of tokens of the input, floating-point operations will not be computed10:59:51, 2.52s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed10:59:51, 2.52s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.743, 'learning_rate': 0.00040143918040112423, 'epoch': 0.58} +Could not estimate the number of tokens of the input, floating-point operations will not be computed10:59:51, 2.52s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10:59:51, 2.52s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 19%|█████████████▉ | 3766/19440 [10:51:23<15:11:56, 3.49s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 19%|█████████████▉ | 3766/19440 [10:51:23<15:11:56, 3.49s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15:11:56, 3.49s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15:11:56, 3.49s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15:11:56, 3.49s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.4773, 'learning_rate': 0.0004013623891276942, 'epoch': 0.58} +Could not estimate the number of tokens of the input, floating-point operations will not be computed15:11:56, 3.49s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15:11:56, 3.49s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.5407, 'learning_rate': 0.0004013367920365509, 'epoch': 0.58} +Could not estimate the number of tokens of the input, floating-point operations will not be computed15:11:56, 3.49s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15:11:56, 3.49s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.4502, 'learning_rate': 0.00040131119494540753, 'epoch': 0.58} +Could not estimate the number of tokens of the input, floating-point operations will not be computed15:11:56, 3.49s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 19%|█████████████▉ | 3771/19440 [10:51:39<14:08:15, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[10:51:39<14:08:15, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[10:51:39<14:08:15, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.4327, 'learning_rate': 0.0004012600007631208, 'epoch': 0.58} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[10:51:39<14:08:15, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:08:15, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:08:15, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.5627, 'learning_rate': 0.0004012088065808341, 'epoch': 0.58} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:08:15, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:08:15, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:08:15, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:08:15, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:08:15, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.6617, 'learning_rate': 0.0004011576123985474, 'epoch': 0.58} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:08:15, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:08:15, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:08:15, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:08:15, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.0469, 'learning_rate': 0.00040110641821626075, 'epoch': 0.58} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:08:15, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:08:15, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:08:15, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:08:15, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:08:15, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:08:15, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:08:15, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:08:15, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:08:15, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:08:15, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:08:15, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 5.7284, 'learning_rate': 0.000400978432760544, 'epoch': 0.58} +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:08:15, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:08:15, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:08:15, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:08:15, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:08:15, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:08:15, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:08:15, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:08:15, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:08:15, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 5.8238, 'learning_rate': 0.0004008760443959706, 'epoch': 0.58} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:08:15, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:08:15, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.044, 'learning_rate': 0.00040085044730482725, 'epoch': 0.58} +{'loss': 5.8945, 'learning_rate': 0.00040082485021368394, 'epoch': 0.58} +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:08:15, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:08:15, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 5.4871, 'learning_rate': 0.00040079925312254053, 'epoch': 0.58} +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:08:15, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 5.5718, 'learning_rate': 0.0004007736560313972, 'epoch': 0.58} +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:08:15, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 5.5259, 'learning_rate': 0.00040074805894025386, 'epoch': 0.59} +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:08:15, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:08:15, 3.25s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 20%|██████████████ | 3793/19440 [10:52:43<11:52:35, 2.73s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 20%|██████████████ | 3793/19440 [10:52:43<11:52:35, 2.73s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[10:52:43<11:52:35, 2.73s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[10:52:43<11:52:35, 2.73s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 5.4225, 'learning_rate': 0.00040067126766682383, 'epoch': 0.59} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[10:52:43<11:52:35, 2.73s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[10:52:43<11:52:35, 2.73s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:52:35, 2.73s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:52:35, 2.73s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 5.4946, 'learning_rate': 0.0004006200734845371, 'epoch': 0.59} +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:52:35, 2.73s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:52:35, 2.73s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:52:35, 2.73s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:52:35, 2.73s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:52:35, 2.73s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:52:35, 2.73s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 4.5113, 'learning_rate': 0.0004005432822111071, 'epoch': 0.59} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:52:35, 2.73s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:52:35, 2.73s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:52:35, 2.73s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:52:35, 2.73s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:52:35, 2.73s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 7.2313, 'learning_rate': 0.0004004920880288204, 'epoch': 0.59} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:52:35, 2.73s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:52:35, 2.73s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.9508, 'learning_rate': 0.00040046649093767705, 'epoch': 0.59} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:52:35, 2.73s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:52:35, 2.73s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:52:35, 2.73s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:52:35, 2.73s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:52:35, 2.73s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:52:35, 2.73s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:52:35, 2.73s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:52:35, 2.73s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:52:35, 2.73s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:52:35, 2.73s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:52:35, 2.73s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 20%|██████████████ | 3807/19440 [10:53:30<16:35:40, 3.82s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[10:53:30<16:35:40, 3.82s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[10:53:30<16:35:40, 3.82s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 7.0987, 'learning_rate': 0.0004003385054819603, 'epoch': 0.59} +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:35:40, 3.82s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:35:40, 3.82s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 20%|██████████████ | 3809/19440 [10:53:37<16:22:51, 3.77s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 20%|██████████████ | 3809/19440 [10:53:37<16:22:51, 3.77s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:22:51, 3.77s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:22:51, 3.77s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:22:51, 3.77s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:22:51, 3.77s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:22:51, 3.77s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:22:51, 3.77s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.867, 'learning_rate': 0.00040023611711738696, 'epoch': 0.59} +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:22:51, 3.77s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:22:51, 3.77s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:22:51, 3.77s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:22:51, 3.77s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:22:51, 3.77s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.6344, 'learning_rate': 0.00040018492293510024, 'epoch': 0.59} +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:22:51, 3.77s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:22:51, 3.77s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:22:51, 3.77s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:22:51, 3.77s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.4788, 'learning_rate': 0.0004001337287528135, 'epoch': 0.59} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:22:51, 3.77s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:22:51, 3.77s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 20%|██████████████▏ | 3817/19440 [10:54:06<15:06:23, 3.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 20%|██████████████▏ | 3817/19440 [10:54:06<15:06:23, 3.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[10:54:06<15:06:23, 3.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[10:54:06<15:06:23, 3.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[10:54:06<15:06:23, 3.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.4828, 'learning_rate': 0.00040005693747938344, 'epoch': 0.59} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[10:54:06<15:06:23, 3.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.3252, 'learning_rate': 0.00040003134038824013, 'epoch': 0.59} +Could not estimate the number of tokens of the input, floating-point operations will not be computed15:06:23, 3.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15:06:23, 3.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed15:06:23, 3.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15:06:23, 3.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15:06:23, 3.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 20%|██████████████▏ | 3822/19440 [10:54:22<14:13:19, 3.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 20%|██████████████▏ | 3822/19440 [10:54:22<14:13:19, 3.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[10:54:22<14:13:19, 3.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[10:54:22<14:13:19, 3.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[10:54:22<14:13:19, 3.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:13:19, 3.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:13:19, 3.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:13:19, 3.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:13:19, 3.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:13:19, 3.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.3738, 'learning_rate': 0.00039987775784138, 'epoch': 0.59} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:13:19, 3.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:13:19, 3.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:13:19, 3.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:13:19, 3.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.4185, 'learning_rate': 0.00039982656365909335, 'epoch': 0.59} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:13:19, 3.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:13:19, 3.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:13:19, 3.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:13:19, 3.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:13:19, 3.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:13:19, 3.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:13:19, 3.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:13:19, 3.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:13:19, 3.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.3637, 'learning_rate': 0.00039972417529451996, 'epoch': 0.59} +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:13:19, 3.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:13:19, 3.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:13:19, 3.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:13:19, 3.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.1717, 'learning_rate': 0.0003996729811122333, 'epoch': 0.59} +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:13:19, 3.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:13:19, 3.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:13:19, 3.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:13:19, 3.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:13:19, 3.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:13:19, 3.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 5.9785, 'learning_rate': 0.0003995961898388032, 'epoch': 0.59} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:13:19, 3.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:13:19, 3.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:13:19, 3.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:13:19, 3.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:13:19, 3.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:13:19, 3.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:13:19, 3.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:13:19, 3.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:13:19, 3.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 5.7295, 'learning_rate': 0.0003994938014742298, 'epoch': 0.59} +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:13:19, 3.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:13:19, 3.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:13:19, 3.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 5.4307, 'learning_rate': 0.00039944260729194315, 'epoch': 0.59} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:13:19, 3.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:13:19, 3.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 20%|██████████████▏ | 3844/19440 [10:55:26<11:32:02, 2.66s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 20%|██████████████▏ | 3844/19440 [10:55:26<11:32:02, 2.66s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:32:02, 2.66s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:32:02, 2.66s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:32:02, 2.66s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:32:02, 2.66s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:32:02, 2.66s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:32:02, 2.66s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:32:02, 2.66s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:32:02, 2.66s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:32:02, 2.66s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 20%|██████████████▎ | 3849/19440 [10:55:38<10:36:20, 2.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 20%|██████████████▎ | 3849/19440 [10:55:38<10:36:20, 2.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[10:55:38<10:36:20, 2.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[10:55:38<10:36:20, 2.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[10:55:38<10:36:20, 2.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed10:36:20, 2.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed10:36:20, 2.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed10:36:20, 2.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 7.1798, 'learning_rate': 0.000399212233471653, 'epoch': 0.59} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10:36:20, 2.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10:36:20, 2.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed10:36:20, 2.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed10:36:20, 2.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed10:36:20, 2.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10:36:20, 2.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10:36:20, 2.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10:36:20, 2.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 7.1821, 'learning_rate': 0.000399135442198223, 'epoch': 0.59} +Could not estimate the number of tokens of the input, floating-point operations will not be computed10:36:20, 2.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed10:36:20, 2.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10:36:20, 2.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10:36:20, 2.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10:36:20, 2.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.8762, 'learning_rate': 0.00039908424801593626, 'epoch': 0.6} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10:36:20, 2.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10:36:20, 2.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 20%|██████████████▎ | 3858/19440 [10:56:13<16:18:44, 3.77s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:18:44, 3.77s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:18:44, 3.77s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.9055, 'learning_rate': 0.0003990330538336496, 'epoch': 0.6} +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:18:44, 3.77s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.8011, 'learning_rate': 0.0003990074567425062, 'epoch': 0.6} +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:18:44, 3.77s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:18:44, 3.77s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.7894, 'learning_rate': 0.00039898185965136287, 'epoch': 0.6} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:18:44, 3.77s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:18:44, 3.77s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:18:44, 3.77s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:18:44, 3.77s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:18:44, 3.77s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 20%|██████████████▎ | 3863/19440 [10:56:31<16:01:05, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[10:56:31<16:01:05, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[10:56:31<16:01:05, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.6125, 'learning_rate': 0.00039890506837793284, 'epoch': 0.6} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[10:56:31<16:01:05, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.763, 'learning_rate': 0.0003988794712867895, 'epoch': 0.6} +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:01:05, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:01:05, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.6407, 'learning_rate': 0.00039885387419564617, 'epoch': 0.6} +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:01:05, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.5737, 'learning_rate': 0.00039882827710450275, 'epoch': 0.6} +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:01:05, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:01:05, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.5702, 'learning_rate': 0.00039880268001335945, 'epoch': 0.6} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:01:05, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:01:05, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:01:05, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:01:05, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.6363, 'learning_rate': 0.0003987514858310727, 'epoch': 0.6} +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:01:05, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:01:05, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.581, 'learning_rate': 0.00039872588873992936, 'epoch': 0.6} +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:01:05, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.6018, 'learning_rate': 0.00039870029164878606, 'epoch': 0.6} +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:01:05, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:01:05, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:01:05, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:01:05, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:01:05, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:01:05, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:01:05, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:01:05, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:01:05, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:01:05, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:01:05, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:01:05, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:01:05, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:01:05, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:01:05, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:01:05, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:01:05, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:01:05, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:01:05, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:01:05, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.2479, 'learning_rate': 0.0003984955149196392, 'epoch': 0.6} +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:01:05, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed16:01:05, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:01:05, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed16:01:05, 3.70s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 20%|██████████████▍ | 3882/19440 [10:57:31<12:58:47, 3.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 20%|██████████████▍ | 3882/19440 [10:57:31<12:58:47, 3.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12:58:47, 3.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12:58:47, 3.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12:58:47, 3.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12:58:47, 3.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12:58:47, 3.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.1766, 'learning_rate': 0.0003983675294639225, 'epoch': 0.6} +Could not estimate the number of tokens of the input, floating-point operations will not be computed12:58:47, 3.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12:58:47, 3.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12:58:47, 3.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12:58:47, 3.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 20%|██████████████▍ | 3887/19440 [10:57:45<12:14:39, 2.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 20%|██████████████▍ | 3887/19440 [10:57:45<12:14:39, 2.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12:14:39, 2.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12:14:39, 2.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12:14:39, 2.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 20%|██████████████▍ | 3889/19440 [10:57:51<12:33:39, 2.91s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 20%|██████████████▍ | 3889/19440 [10:57:51<12:33:39, 2.91s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[10:57:51<12:33:39, 2.91s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[10:57:51<12:33:39, 2.91s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[10:57:51<12:33:39, 2.91s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 5.6603, 'learning_rate': 0.00039821394691706247, 'epoch': 0.6} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[10:57:51<12:33:39, 2.91s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12:33:39, 2.91s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12:33:39, 2.91s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 5.3642, 'learning_rate': 0.00039816275273477574, 'epoch': 0.6} +Could not estimate the number of tokens of the input, floating-point operations will not be computed12:33:39, 2.91s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 5.2546, 'learning_rate': 0.0003981371556436324, 'epoch': 0.6} +Could not estimate the number of tokens of the input, floating-point operations will not be computed12:33:39, 2.91s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12:33:39, 2.91s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12:33:39, 2.91s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12:33:39, 2.91s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12:33:39, 2.91s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12:33:39, 2.91s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12:33:39, 2.91s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12:33:39, 2.91s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12:33:39, 2.91s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12:33:39, 2.91s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12:33:39, 2.91s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 4.8303, 'learning_rate': 0.00039800917018791563, 'epoch': 0.6} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12:33:39, 2.91s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 20%|██████████████▍ | 3900/19440 [10:58:20<10:56:58, 2.54s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed10:56:58, 2.54s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed10:56:58, 2.54s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10:56:58, 2.54s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed10:56:58, 2.54s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed10:56:58, 2.54s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 7.3055, 'learning_rate': 0.0003979323789144856, 'epoch': 0.6} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10:56:58, 2.54s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10:56:58, 2.54s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed10:56:58, 2.54s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed10:56:58, 2.54s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed10:56:58, 2.54s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.9392, 'learning_rate': 0.00039788118473219893, 'epoch': 0.6} +Could not estimate the number of tokens of the input, floating-point operations will not be computed10:56:58, 2.54s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed10:56:58, 2.54s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10:56:58, 2.54s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10:56:58, 2.54s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10:56:58, 2.54s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed10:56:58, 2.54s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed10:56:58, 2.54s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed10:56:58, 2.54s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10:56:58, 2.54s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed10:56:58, 2.54s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed10:56:58, 2.54s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10:56:58, 2.54s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed10:56:58, 2.54s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed10:56:58, 2.54s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10:56:58, 2.54s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed10:56:58, 2.54s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed10:56:58, 2.54s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10:56:58, 2.54s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10:56:58, 2.54s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed10:56:58, 2.54s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10:56:58, 2.54s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10:56:58, 2.54s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 20%|██████████████▍ | 3912/19440 [10:59:06<15:26:14, 3.58s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[10:59:06<15:26:14, 3.58s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[10:59:06<15:26:14, 3.58s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15:26:14, 3.58s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed15:26:14, 3.58s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed15:26:14, 3.58s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15:26:14, 3.58s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15:26:14, 3.58s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.4947, 'learning_rate': 0.00039759961672962207, 'epoch': 0.6} +Could not estimate the number of tokens of the input, floating-point operations will not be computed15:26:14, 3.58s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15:26:14, 3.58s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.5983, 'learning_rate': 0.00039757401963847876, 'epoch': 0.6} +Could not estimate the number of tokens of the input, floating-point operations will not be computed15:26:14, 3.58s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15:26:14, 3.58s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed15:26:14, 3.58s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed15:26:14, 3.58s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15:26:14, 3.58s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15:26:14, 3.58s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15:26:14, 3.58s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed15:26:14, 3.58s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed15:26:14, 3.58s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15:26:14, 3.58s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15:26:14, 3.58s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15:26:14, 3.58s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.5021, 'learning_rate': 0.00039744603418276196, 'epoch': 0.61} +Could not estimate the number of tokens of the input, floating-point operations will not be computed15:26:14, 3.58s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed15:26:14, 3.58s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed15:26:14, 3.58s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed15:26:14, 3.58s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.3994, 'learning_rate': 0.0003973948400004753, 'epoch': 0.61} +Could not estimate the number of tokens of the input, floating-point operations will not be computed15:26:14, 3.58s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed15:26:14, 3.58s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.6642, 'learning_rate': 0.00039736924290933193, 'epoch': 0.61} +Could not estimate the number of tokens of the input, floating-point operations will not be computed15:26:14, 3.58s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 20%|██████████████▌ | 3925/19440 [10:59:49<14:04:16, 3.27s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[10:59:49<14:04:16, 3.27s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[10:59:49<14:04:16, 3.27s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.4169, 'learning_rate': 0.00039731804872704526, 'epoch': 0.61} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[10:59:49<14:04:16, 3.27s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.3151, 'learning_rate': 0.00039729245163590195, 'epoch': 0.61} +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:04:16, 3.27s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:04:16, 3.27s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:04:16, 3.27s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:04:16, 3.27s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:04:16, 3.27s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:04:16, 3.27s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:04:16, 3.27s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.0751, 'learning_rate': 0.00039721566036247187, 'epoch': 0.61} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:04:16, 3.27s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.1836, 'learning_rate': 0.0003971900632713285, 'epoch': 0.61} +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:04:16, 3.27s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed14:04:16, 3.27s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:04:16, 3.27s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed14:04:16, 3.27s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 20%|██████████████▌ | 3933/19440 [11:00:13<12:38:58, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 20%|██████████████▌ | 3933/19440 [11:00:13<12:38:58, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12:38:58, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12:38:58, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12:38:58, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12:38:58, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12:38:58, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 5.9151, 'learning_rate': 0.0003970620778156118, 'epoch': 0.61} +Could not estimate the number of tokens of the input, floating-point operations will not be computed12:38:58, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12:38:58, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12:38:58, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12:38:58, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.0883, 'learning_rate': 0.0003970108836333251, 'epoch': 0.61} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12:38:58, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12:38:58, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12:38:58, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12:38:58, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 5.7709, 'learning_rate': 0.00039695968945103837, 'epoch': 0.61} +Could not estimate the number of tokens of the input, floating-point operations will not be computed12:38:58, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12:38:58, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12:38:58, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12:38:58, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 5.482, 'learning_rate': 0.0003969084952687517, 'epoch': 0.61} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12:38:58, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 5.3689, 'learning_rate': 0.0003968828981776084, 'epoch': 0.61} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12:38:58, 2.94s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 20%|██████████████▌ | 3944/19440 [11:00:43<11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[11:00:43<11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[11:00:43<11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 4.8914, 'learning_rate': 0.0003967549127218916, 'epoch': 0.61} +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 4.6131, 'learning_rate': 0.0003967293156307483, 'epoch': 0.61} +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 4.6578, 'learning_rate': 0.0003967037185396049, 'epoch': 0.61} +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 7.064, 'learning_rate': 0.00039665252435731825, 'epoch': 0.61} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 7.1299, 'learning_rate': 0.00039662692726617483, 'epoch': 0.61} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 7.2299, 'learning_rate': 0.00039660133017503153, 'epoch': 0.61} +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 7.0332, 'learning_rate': 0.00039657573308388817, 'epoch': 0.61} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.92, 'learning_rate': 0.00039652453890160144, 'epoch': 0.61} +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.9758, 'learning_rate': 0.0003964477476281714, 'epoch': 0.61} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.7527, 'learning_rate': 0.000396345359263598, 'epoch': 0.61} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.6747, 'learning_rate': 0.0003962941650813113, 'epoch': 0.61} +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.6611, 'learning_rate': 0.000396268567990168, 'epoch': 0.61} +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.4114, 'learning_rate': 0.0003962429708990247, 'epoch': 0.61} +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.7491, 'learning_rate': 0.0003962173738078813, 'epoch': 0.61} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.4583, 'learning_rate': 0.00039614058253445124, 'epoch': 0.61} +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.2013, 'learning_rate': 0.0003960893883521646, 'epoch': 0.61} +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.3521, 'learning_rate': 0.0003960637912610212, 'epoch': 0.61} +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:23:33, 2.65s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 20%|██████████████▋ | 3976/19440 [11:02:30<13:45:38, 3.20s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[11:02:30<13:45:38, 3.20s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[11:02:30<13:45:38, 3.20s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed13:45:38, 3.20s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed13:45:38, 3.20s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 20%|██████████████▋ | 3978/19440 [11:02:36<13:17:15, 3.09s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed13:17:15, 3.09s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed13:17:15, 3.09s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed13:17:15, 3.09s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed13:17:15, 3.09s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed13:17:15, 3.09s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed13:17:15, 3.09s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed13:17:15, 3.09s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed13:17:15, 3.09s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed13:17:15, 3.09s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed13:17:15, 3.09s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed13:17:15, 3.09s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.1963, 'learning_rate': 0.00039585901453187443, 'epoch': 0.61} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed13:17:15, 3.09s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed13:17:15, 3.09s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed13:17:15, 3.09s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed13:17:15, 3.09s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 20%|██████████████▊ | 3985/19440 [11:02:56<12:18:42, 2.87s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 20%|██████████████▊ | 3985/19440 [11:02:56<12:18:42, 2.87s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[11:02:56<12:18:42, 2.87s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[11:02:56<12:18:42, 2.87s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12:18:42, 2.87s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12:18:42, 2.87s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12:18:42, 2.87s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 6.1286, 'learning_rate': 0.0003957310290761577, 'epoch': 0.62} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed12:18:42, 2.87s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12:18:42, 2.87s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12:18:42, 2.87s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12:18:42, 2.87s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|██████████████▊ | 3990/19440 [11:03:11<12:09:58, 2.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|██████████████▊ | 3990/19440 [11:03:11<12:09:58, 2.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12:09:58, 2.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12:09:58, 2.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 5.8511, 'learning_rate': 0.0003956286407115843, 'epoch': 0.62} +Could not estimate the number of tokens of the input, floating-point operations will not be computed12:09:58, 2.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed12:09:58, 2.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|██████████████▊ | 3993/19440 [11:03:19<11:39:21, 2.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|██████████████▊ | 3993/19440 [11:03:19<11:39:21, 2.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[11:03:19<11:39:21, 2.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:39:21, 2.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed11:39:21, 2.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:39:21, 2.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11:39:21, 2.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +{'loss': 5.1851, 'learning_rate': 0.0003955262523470109, 'epoch': 0.62} + 21%|██████████████▊ | 3997/19440 [11:03:29<10:51:35, 2.53s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|██████████████▊ | 3997/19440 [11:03:29<10:51:35, 2.53s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[11:03:29<10:51:35, 2.53s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...[11:03:29<10:51:35, 2.53s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed10:51:35, 2.53s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Could not estimate the number of tokens of the input, floating-point operations will not be computed10:51:35, 2.53s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10:51:35, 2.53s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed10:51:35, 2.53s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 0%| | 2/3690 [00:01<30:46, 2.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 0%| | 4/3690 [00:03<51:40, 1.19it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 0%|▏ | 6/3690 [00:05<58:35, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 0%|▏ | 8/3690 [00:07<1:01:40, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 0%|▏ | 10/3690 [00:09<1:02:22, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 0%|▏ | 11/3690 [00:10<1:02:16, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 0%|▎ | 13/3690 [00:12<1:07:07, 1.10s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 0%|▎ | 15/3690 [00:14<1:03:21, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 0%|▎ | 17/3690 [00:16<1:02:25, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 1%|▍ | 19/3690 [00:18<1:02:05, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 1%|▍ | 21/3690 [00:20<1:02:37, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 1%|▍ | 23/3690 [00:22<1:01:08, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 1%|▌ | 25/3690 [00:25<1:06:15, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 1%|▌ | 27/3690 [00:26<1:01:59, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 1%|▌ | 29/3690 [00:29<1:04:20, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 1%|▋ | 31/3690 [00:31<1:02:14, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 1%|▋ | 33/3690 [00:33<1:02:19, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 1%|▋ | 35/3690 [00:35<1:02:07, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 1%|▊ | 37/3690 [00:37<1:03:28, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 1%|▊ | 39/3690 [00:39<1:03:52, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 1%|▉ | 41/3690 [00:41<1:01:28, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 1%|▉ | 43/3690 [00:43<57:49, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 1%|▉ | 45/3690 [00:45<1:00:41, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 1%|█ | 47/3690 [00:47<59:06, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 1%|█ | 49/3690 [00:49<1:04:18, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 1%|█ | 51/3690 [00:51<1:03:50, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 1%|█▏ | 53/3690 [00:53<1:05:20, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 1%|█▏ | 54/3690 [00:54<1:07:55, 1.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 2%|█▏ | 56/3690 [00:57<1:07:28, 1.11s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 2%|█▏ | 58/3690 [00:59<1:06:27, 1.10s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 2%|█▎ | 60/3690 [01:01<1:06:08, 1.09s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 2%|█▎ | 62/3690 [01:03<1:05:16, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 2%|█▎ | 64/3690 [01:05<1:04:48, 1.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 2%|█▍ | 66/3690 [01:07<1:01:27, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 2%|█▍ | 68/3690 [01:09<1:01:36, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 2%|█▍ | 70/3690 [01:11<1:02:21, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 2%|█▌ | 72/3690 [01:13<1:02:58, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 2%|█▌ | 74/3690 [01:15<1:01:44, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 2%|█▌ | 75/3690 [01:17<1:04:17, 1.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 2%|█▋ | 77/3690 [01:19<1:02:54, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 2%|█▋ | 79/3690 [01:21<1:00:48, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 2%|█▋ | 81/3690 [01:23<1:02:05, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 2%|█▊ | 84/3690 [01:25<55:51, 1.08it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 2%|█▉ | 86/3690 [01:27<57:53, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 2%|█▉ | 87/3690 [01:28<58:58, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 2%|█▉ | 89/3690 [01:31<1:02:35, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 2%|█▉ | 91/3690 [01:33<1:02:08, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 3%|█▉ | 93/3690 [01:35<1:01:19, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 3%|██ | 95/3690 [01:37<58:33, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 3%|██ | 97/3690 [01:39<1:03:46, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 3%|██ | 99/3690 [01:41<1:03:30, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 3%|██ | 100/3690 [01:42<1:04:21, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 3%|██▏ | 103/3690 [01:45<1:01:23, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 3%|██▏ | 105/3690 [01:47<1:01:08, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 3%|██▎ | 107/3690 [01:49<1:02:48, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 3%|██▎ | 109/3690 [01:51<58:52, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 3%|██▎ | 111/3690 [01:53<1:00:25, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 3%|██▍ | 113/3690 [01:55<56:54, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 3%|██▍ | 115/3690 [01:57<58:01, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 3%|██▌ | 117/3690 [01:59<59:12, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 3%|██▌ | 119/3690 [02:01<59:29, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 3%|██▌ | 121/3690 [02:03<1:00:25, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 3%|██▌ | 123/3690 [02:05<1:01:12, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 3%|██▋ | 125/3690 [02:07<1:01:54, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 3%|██▋ | 127/3690 [02:09<1:02:03, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 3%|██▋ | 129/3690 [02:11<1:00:28, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 4%|██▊ | 131/3690 [02:13<1:01:06, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 4%|██▉ | 133/3690 [02:15<59:05, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 4%|██▊ | 135/3690 [02:17<1:00:31, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 4%|██▊ | 136/3690 [02:18<1:00:56, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 4%|██▉ | 138/3690 [02:21<1:02:54, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 4%|██▉ | 140/3690 [02:23<1:04:24, 1.09s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 4%|███ | 142/3690 [02:25<1:01:32, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 4%|███ | 144/3690 [02:27<59:27, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 4%|███ | 146/3690 [02:29<1:00:11, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 4%|███▏ | 148/3690 [02:31<1:00:54, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 4%|███▏ | 150/3690 [02:33<1:01:27, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 4%|███▏ | 152/3690 [02:35<1:00:40, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 4%|███▎ | 154/3690 [02:37<58:56, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 4%|███▎ | 156/3690 [02:39<1:00:52, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 4%|███▍ | 158/3690 [02:41<58:59, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 4%|███▍ | 160/3690 [02:43<1:01:30, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 4%|███▍ | 161/3690 [02:44<1:03:27, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 4%|███▍ | 163/3690 [02:47<1:04:10, 1.09s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 4%|███▍ | 165/3690 [02:49<1:00:31, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 5%|███▌ | 167/3690 [02:51<59:50, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 5%|███▋ | 169/3690 [02:53<58:27, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 5%|███▌ | 171/3690 [02:55<1:00:25, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 5%|███▋ | 173/3690 [02:57<1:01:44, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 5%|███▋ | 175/3690 [02:59<1:03:41, 1.09s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 5%|███▊ | 177/3690 [03:01<59:38, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 5%|███▉ | 179/3690 [03:03<59:17, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 5%|███▉ | 181/3690 [03:05<59:16, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 5%|███▊ | 183/3690 [03:07<1:01:17, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 5%|████ | 185/3690 [03:09<57:49, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 5%|███▉ | 187/3690 [03:11<1:00:09, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 5%|███▉ | 189/3690 [03:13<1:01:00, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 5%|████ | 191/3690 [03:15<1:01:28, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 5%|████▏ | 193/3690 [03:17<58:38, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 5%|████▏ | 195/3690 [03:19<58:15, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 5%|████▎ | 197/3690 [03:21<58:08, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 5%|████▎ | 199/3690 [03:23<56:06, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 5%|████▎ | 201/3690 [03:25<56:58, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 6%|████▍ | 203/3690 [03:27<58:17, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 6%|████▍ | 205/3690 [03:29<58:33, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 6%|████▍ | 207/3690 [03:31<1:01:24, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 6%|████▌ | 209/3690 [03:33<59:46, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 6%|████▌ | 212/3690 [03:36<55:45, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 6%|████▋ | 214/3690 [03:38<56:34, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 6%|████▋ | 216/3690 [03:40<56:45, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 6%|████▌ | 218/3690 [03:42<1:01:03, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 6%|████▋ | 219/3690 [03:43<57:20, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 6%|████▋ | 221/3690 [03:46<1:02:43, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 6%|████▋ | 223/3690 [03:48<1:01:28, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 6%|████▊ | 225/3690 [03:50<1:00:42, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 6%|████▊ | 227/3690 [03:52<1:03:06, 1.09s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 6%|████▉ | 229/3690 [03:54<57:38, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 6%|█████ | 231/3690 [03:56<57:11, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 6%|█████ | 233/3690 [03:58<56:34, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 6%|█████ | 235/3690 [04:00<59:44, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 6%|█████ | 237/3690 [04:02<1:00:50, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 6%|█████▏ | 239/3690 [04:04<58:42, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 7%|█████▏ | 240/3690 [04:05<59:29, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 7%|█████▏ | 242/3690 [04:07<58:58, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 7%|█████▎ | 245/3690 [04:10<56:47, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 7%|█████▎ | 247/3690 [04:12<55:28, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 7%|█████▍ | 249/3690 [04:14<56:41, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 7%|█████▍ | 251/3690 [04:16<57:56, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 7%|█████▍ | 253/3690 [04:18<56:27, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 7%|█████▌ | 255/3690 [04:20<56:33, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 7%|█████▌ | 257/3690 [04:22<57:14, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 7%|█████▌ | 259/3690 [04:24<57:18, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 7%|█████▋ | 261/3690 [04:26<58:44, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 7%|█████▋ | 262/3690 [04:27<59:12, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 7%|█████▌ | 264/3690 [04:29<1:01:03, 1.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 7%|█████▌ | 266/3690 [04:31<1:00:02, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 7%|█████▋ | 268/3690 [04:34<1:01:05, 1.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 7%|█████▊ | 270/3690 [04:36<59:12, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 7%|█████▉ | 272/3690 [04:38<59:13, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 7%|█████▉ | 274/3690 [04:40<59:05, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 7%|█████▉ | 276/3690 [04:42<58:40, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 8%|██████ | 278/3690 [04:44<58:01, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 8%|██████ | 280/3690 [04:46<58:07, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 8%|█████▉ | 282/3690 [04:48<1:01:55, 1.09s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 8%|██████ | 284/3690 [04:50<1:00:02, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 8%|██████▏ | 286/3690 [04:52<58:06, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 8%|██████ | 287/3690 [04:53<1:00:32, 1.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 8%|██████▎ | 290/3690 [04:56<57:25, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 8%|██████▎ | 292/3690 [04:58<56:04, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 8%|██████▎ | 294/3690 [05:00<53:50, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 8%|██████▍ | 296/3690 [05:02<52:14, 1.08it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 8%|██████▍ | 298/3690 [05:04<57:37, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 8%|██████▌ | 300/3690 [05:06<56:39, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 8%|██████▌ | 302/3690 [05:08<54:03, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 8%|██████▌ | 304/3690 [05:10<57:32, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 8%|██████▋ | 306/3690 [05:12<58:21, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 8%|██████▋ | 308/3690 [05:14<55:40, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 8%|██████▋ | 310/3690 [05:16<54:49, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 8%|██████▊ | 312/3690 [05:18<57:24, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|██████▊ | 314/3690 [05:20<54:05, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|██████▊ | 316/3690 [05:22<53:28, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|██████▉ | 318/3690 [05:24<54:03, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|██████▉ | 320/3690 [05:26<55:04, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|██████▉ | 322/3690 [05:28<56:44, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████ | 324/3690 [05:30<57:06, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|██████▉ | 326/3690 [05:32<1:02:15, 1.11s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████ | 328/3690 [05:34<59:00, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▏ | 330/3690 [05:36<57:23, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▏ | 332/3690 [05:38<54:39, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▏ | 334/3690 [05:40<55:38, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▎ | 336/3690 [05:42<58:34, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▎ | 338/3690 [05:45<58:59, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▎ | 340/3690 [05:47<58:22, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▍ | 342/3690 [05:49<55:34, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▍ | 344/3690 [05:50<54:16, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 346/3690 [05:52<52:32, 1.06it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 348/3690 [05:54<54:52, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 350/3690 [05:56<55:41, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 10%|███████▋ | 352/3690 [05:59<58:28, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 10%|███████▋ | 354/3690 [06:00<54:08, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 10%|███████▋ | 356/3690 [06:02<52:40, 1.06it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 10%|███████▊ | 358/3690 [06:04<53:36, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 10%|███████▊ | 360/3690 [06:06<52:24, 1.06it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 10%|███████▊ | 362/3690 [06:08<54:36, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 10%|███████▉ | 364/3690 [06:10<56:03, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 10%|███████▉ | 366/3690 [06:12<56:19, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 10%|███████▉ | 368/3690 [06:14<56:30, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 10%|████████ | 370/3690 [06:16<53:58, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 10%|████████ | 372/3690 [06:18<58:03, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 10%|████████ | 374/3690 [06:21<58:46, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 10%|████████▏ | 376/3690 [06:23<56:59, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 10%|████████▏ | 378/3690 [06:24<54:16, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 10%|████████▏ | 380/3690 [06:27<59:04, 1.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 10%|████████▎ | 382/3690 [06:29<55:56, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 10%|████████▎ | 384/3690 [06:31<53:24, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 10%|████████▎ | 386/3690 [06:33<56:10, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 11%|████████▍ | 388/3690 [06:35<55:11, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 11%|████████▍ | 390/3690 [06:37<53:17, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 11%|████████▍ | 392/3690 [06:39<54:16, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 11%|████████▌ | 394/3690 [06:41<55:10, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 11%|████████▌ | 396/3690 [06:43<57:08, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 11%|████████▋ | 398/3690 [06:45<56:54, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 11%|████████▋ | 400/3690 [06:47<54:38, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 11%|████████▋ | 402/3690 [06:49<57:28, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 11%|████████▊ | 404/3690 [06:51<58:00, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 11%|████████▊ | 406/3690 [06:53<56:48, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 11%|████████▊ | 408/3690 [06:55<54:32, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 11%|████████▉ | 410/3690 [06:57<57:58, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 11%|████████▋ | 411/3690 [06:58<1:00:25, 1.11s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 11%|████████▉ | 413/3690 [07:01<57:55, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 11%|█████████ | 416/3690 [07:03<53:43, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 11%|█████████ | 418/3690 [07:05<53:21, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 11%|█████████ | 419/3690 [07:06<54:59, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 11%|█████████▏ | 421/3690 [07:09<58:13, 1.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 11%|█████████▏ | 423/3690 [07:11<57:11, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 12%|█████████▏ | 425/3690 [07:13<54:39, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 12%|█████████▎ | 427/3690 [07:15<54:53, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 12%|█████████▎ | 429/3690 [07:17<59:49, 1.10s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 12%|█████████▎ | 431/3690 [07:19<55:55, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 12%|█████████▍ | 433/3690 [07:21<57:31, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 12%|█████████▍ | 435/3690 [07:23<56:00, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 12%|█████████▍ | 437/3690 [07:25<56:19, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 12%|█████████▌ | 439/3690 [07:27<56:23, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 12%|█████████▌ | 441/3690 [07:29<54:56, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 12%|█████████▌ | 443/3690 [07:31<54:16, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 12%|█████████▋ | 445/3690 [07:33<53:19, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 12%|█████████▋ | 447/3690 [07:35<52:59, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 12%|█████████▋ | 449/3690 [07:38<57:31, 1.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 12%|█████████▊ | 451/3690 [07:40<54:07, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 12%|█████████▊ | 453/3690 [07:41<52:33, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 12%|█████████▊ | 455/3690 [07:44<55:15, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 12%|█████████▉ | 457/3690 [07:46<55:50, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 12%|█████████▉ | 459/3690 [07:48<55:09, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 12%|█████████▉ | 460/3690 [07:49<55:27, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 13%|██████████ | 462/3690 [07:51<52:49, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 13%|██████████ | 464/3690 [07:52<51:03, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 13%|██████████ | 466/3690 [07:54<51:54, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 13%|██████████▏ | 468/3690 [07:57<56:33, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 13%|██████████▏ | 470/3690 [07:59<53:56, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 13%|██████████▏ | 472/3690 [08:00<52:25, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 13%|██████████▎ | 474/3690 [08:03<53:10, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 13%|██████████▎ | 476/3690 [08:05<55:24, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 13%|██████████▎ | 478/3690 [08:07<53:10, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 13%|██████████▍ | 480/3690 [08:08<51:48, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 13%|██████████▍ | 482/3690 [08:11<54:29, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 13%|██████████▍ | 484/3690 [08:13<54:27, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 13%|██████████▌ | 486/3690 [08:15<55:08, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 13%|██████████▌ | 488/3690 [08:17<51:52, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 13%|██████████▌ | 490/3690 [08:19<52:30, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 13%|██████████▋ | 492/3690 [08:21<53:32, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 13%|██████████▋ | 494/3690 [08:23<55:19, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 13%|██████████▊ | 496/3690 [08:25<53:44, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 13%|██████████▊ | 498/3690 [08:27<53:11, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|██████████▊ | 501/3690 [08:29<49:10, 1.08it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|██████████▉ | 503/3690 [08:31<49:33, 1.07it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|██████████▉ | 504/3690 [08:32<53:05, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|██████████▉ | 506/3690 [08:35<56:42, 1.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████ | 508/3690 [08:37<54:37, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████ | 510/3690 [08:39<57:33, 1.09s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████ | 512/3690 [08:41<58:51, 1.11s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▏ | 514/3690 [08:43<57:04, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▏ | 516/3690 [08:45<56:34, 1.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▏ | 518/3690 [08:47<53:59, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▎ | 520/3690 [08:49<53:42, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▎ | 522/3690 [08:51<53:41, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▎ | 523/3690 [08:53<57:09, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▍ | 525/3690 [08:55<56:46, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▍ | 527/3690 [08:57<56:34, 1.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▍ | 529/3690 [08:59<53:40, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 531/3690 [09:01<52:43, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 533/3690 [09:03<53:16, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 535/3690 [09:05<54:16, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 15%|███████████▋ | 537/3690 [09:07<51:45, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 15%|███████████▋ | 539/3690 [09:09<54:13, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 15%|███████████▋ | 541/3690 [09:11<53:28, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 15%|███████████▊ | 543/3690 [09:13<53:39, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 15%|███████████▊ | 545/3690 [09:15<53:20, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 15%|███████████▊ | 547/3690 [09:17<53:15, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 15%|███████████▉ | 549/3690 [09:19<52:45, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 15%|███████████▉ | 551/3690 [09:21<54:18, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 15%|███████████▉ | 553/3690 [09:23<55:10, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 15%|████████████ | 555/3690 [09:25<53:16, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 15%|████████████ | 557/3690 [09:27<51:14, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 15%|████████████ | 559/3690 [09:29<53:03, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 15%|████████████▏ | 561/3690 [09:31<52:54, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 15%|████████████▏ | 563/3690 [09:34<55:11, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 15%|████████████▏ | 564/3690 [09:34<52:24, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 15%|████████████▎ | 567/3690 [09:38<52:12, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 15%|████████████▎ | 569/3690 [09:39<50:53, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 15%|████████████▍ | 571/3690 [09:42<52:06, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 16%|████████████▍ | 573/3690 [09:43<50:59, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 16%|████████████▍ | 575/3690 [09:46<53:21, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 16%|████████████▌ | 577/3690 [09:48<52:45, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 16%|████████████▌ | 579/3690 [09:50<51:52, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 16%|████████████▌ | 580/3690 [09:51<53:35, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 16%|████████████▌ | 582/3690 [09:53<55:00, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 16%|████████████▋ | 584/3690 [09:55<54:50, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 16%|████████████▋ | 586/3690 [09:57<51:01, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 16%|████████████▋ | 588/3690 [09:59<51:05, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 16%|████████████▊ | 591/3690 [10:02<50:45, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 16%|████████████▊ | 593/3690 [10:04<48:35, 1.06it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 16%|████████████▉ | 595/3690 [10:06<51:33, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 16%|████████████▉ | 597/3690 [10:08<52:32, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 16%|████████████▉ | 599/3690 [10:10<51:27, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 16%|█████████████ | 601/3690 [10:12<51:52, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 16%|█████████████ | 603/3690 [10:14<52:30, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 16%|█████████████ | 605/3690 [10:16<54:33, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 16%|█████████████▏ | 606/3690 [10:17<54:04, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 17%|█████████████▏ | 609/3690 [10:20<48:09, 1.07it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 17%|█████████████▏ | 611/3690 [10:22<48:57, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 17%|█████████████▎ | 613/3690 [10:24<50:49, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 17%|█████████████▎ | 615/3690 [10:26<50:16, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 17%|█████████████▍ | 617/3690 [10:28<52:43, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 17%|█████████████▍ | 619/3690 [10:30<49:28, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 17%|█████████████▍ | 621/3690 [10:32<50:22, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 17%|█████████████▌ | 623/3690 [10:34<50:15, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 17%|█████████████▌ | 625/3690 [10:36<53:49, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 17%|█████████████▌ | 627/3690 [10:38<51:47, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 17%|█████████████▋ | 629/3690 [10:40<52:56, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 17%|█████████████▋ | 631/3690 [10:42<51:25, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 17%|█████████████▋ | 633/3690 [10:44<52:40, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 17%|█████████████▊ | 635/3690 [10:46<48:57, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 17%|█████████████▊ | 637/3690 [10:48<47:31, 1.07it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 17%|█████████████▊ | 639/3690 [10:50<48:45, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 17%|█████████████▉ | 641/3690 [10:52<47:52, 1.06it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 17%|█████████████▉ | 643/3690 [10:54<52:16, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 17%|█████████████▉ | 645/3690 [10:56<52:12, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 18%|██████████████ | 647/3690 [10:58<52:02, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 18%|██████████████ | 649/3690 [11:00<51:01, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 18%|██████████████ | 651/3690 [11:02<50:14, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 18%|██████████████▏ | 653/3690 [11:04<50:43, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 18%|██████████████▏ | 655/3690 [11:06<47:39, 1.06it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 18%|██████████████▏ | 657/3690 [11:08<47:50, 1.06it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 18%|██████████████▎ | 659/3690 [11:10<48:32, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 18%|██████████████▎ | 661/3690 [11:12<49:38, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 18%|██████████████▎ | 663/3690 [11:14<53:51, 1.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 18%|██████████████▍ | 665/3690 [11:16<54:59, 1.09s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 18%|██████████████▍ | 667/3690 [11:18<50:27, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 18%|██████████████▌ | 669/3690 [11:20<50:56, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 18%|██████████████▌ | 671/3690 [11:22<49:39, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 18%|██████████████▌ | 673/3690 [11:24<50:42, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 18%|██████████████▋ | 675/3690 [11:26<50:49, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 18%|██████████████▋ | 677/3690 [11:28<51:23, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 18%|██████████████▋ | 679/3690 [11:30<51:38, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 18%|██████████████▊ | 681/3690 [11:32<51:38, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 19%|██████████████▊ | 683/3690 [11:34<46:59, 1.07it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 19%|██████████████▊ | 685/3690 [11:36<48:26, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 19%|██████████████▉ | 687/3690 [11:38<48:14, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 19%|██████████████▉ | 689/3690 [11:40<47:57, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 19%|██████████████▉ | 691/3690 [11:42<49:42, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 19%|███████████████ | 693/3690 [11:44<49:32, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 19%|███████████████ | 695/3690 [11:46<48:56, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 19%|███████████████ | 697/3690 [11:48<50:51, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 19%|███████████████▏ | 699/3690 [11:50<52:18, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 19%|███████████████▏ | 700/3690 [11:51<52:25, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 19%|███████████████▏ | 702/3690 [11:53<53:24, 1.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 19%|███████████████▎ | 704/3690 [11:55<53:18, 1.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 19%|███████████████▎ | 706/3690 [11:57<48:54, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 19%|███████████████▎ | 708/3690 [11:59<51:04, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 19%|███████████████▍ | 710/3690 [12:02<55:08, 1.11s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 19%|███████████████▍ | 712/3690 [12:04<54:09, 1.09s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 19%|███████████████▍ | 714/3690 [12:06<51:53, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 19%|███████████████▌ | 716/3690 [12:08<48:36, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 19%|███████████████▌ | 718/3690 [12:10<47:12, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 20%|███████████████▌ | 720/3690 [12:12<53:23, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 20%|███████████████▋ | 722/3690 [12:14<52:42, 1.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 20%|███████████████▋ | 724/3690 [12:16<50:00, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 20%|███████████████▋ | 726/3690 [12:18<50:25, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 20%|███████████████▊ | 728/3690 [12:20<49:51, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 20%|███████████████▊ | 730/3690 [12:22<51:12, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 20%|███████████████▊ | 732/3690 [12:24<48:30, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 20%|███████████████▉ | 734/3690 [12:26<51:46, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 20%|███████████████▉ | 736/3690 [12:28<49:15, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 20%|████████████████ | 738/3690 [12:30<50:16, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 20%|████████████████ | 740/3690 [12:32<46:32, 1.06it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 20%|████████████████ | 742/3690 [12:34<47:58, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 20%|████████████████▏ | 744/3690 [12:36<51:20, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 20%|████████████████▏ | 746/3690 [12:38<49:27, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 20%|████████████████▏ | 748/3690 [12:40<48:24, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 20%|████████████████▎ | 750/3690 [12:42<50:29, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 20%|████████████████▎ | 752/3690 [12:44<49:51, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 20%|████████████████▎ | 754/3690 [12:46<47:18, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 20%|████████████████▍ | 756/3690 [12:48<48:37, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▍ | 757/3690 [12:49<51:21, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▍ | 759/3690 [12:51<51:09, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▍ | 761/3690 [12:54<52:16, 1.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▌ | 763/3690 [12:56<51:11, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▌ | 765/3690 [12:58<51:17, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▋ | 767/3690 [13:00<49:12, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▋ | 769/3690 [13:02<50:47, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▋ | 771/3690 [13:04<49:23, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 773/3690 [13:06<47:26, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 775/3690 [13:08<49:37, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 777/3690 [13:10<47:00, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▉ | 779/3690 [13:12<46:30, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▉ | 781/3690 [13:14<47:27, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▉ | 783/3690 [13:16<48:07, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|█████████████████ | 785/3690 [13:18<48:01, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|█████████████████ | 787/3690 [13:20<48:37, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|█████████████████ | 789/3690 [13:22<50:43, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|█████████████████▏ | 791/3690 [13:24<53:04, 1.10s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|█████████████████▏ | 793/3690 [13:26<51:53, 1.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 22%|█████████████████▏ | 795/3690 [13:28<51:10, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 22%|█████████████████▎ | 797/3690 [13:30<48:27, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 22%|█████████████████▎ | 799/3690 [13:33<50:26, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 22%|█████████████████▎ | 801/3690 [13:35<49:45, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 22%|█████████████████▍ | 803/3690 [13:37<49:52, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 22%|█████████████████▍ | 805/3690 [13:39<49:58, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 22%|█████████████████▍ | 806/3690 [13:40<50:08, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 22%|█████████████████▌ | 808/3690 [13:42<51:25, 1.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 22%|█████████████████▌ | 810/3690 [13:44<54:10, 1.13s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 22%|█████████████████▌ | 812/3690 [13:47<52:40, 1.10s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 22%|█████████████████▋ | 814/3690 [13:48<48:55, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 22%|█████████████████▋ | 816/3690 [13:50<46:08, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 22%|█████████████████▋ | 818/3690 [13:52<46:17, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 22%|█████████████████▊ | 820/3690 [13:54<48:21, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 22%|█████████████████▊ | 822/3690 [13:56<47:45, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 22%|█████████████████▊ | 824/3690 [13:58<48:06, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 22%|█████████████████▉ | 826/3690 [14:00<48:54, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 22%|█████████████████▉ | 828/3690 [14:03<49:23, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 22%|█████████████████▉ | 830/3690 [14:04<47:17, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 23%|██████████████████ | 832/3690 [14:07<48:41, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 23%|██████████████████ | 834/3690 [14:08<47:19, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 23%|██████████████████ | 836/3690 [14:10<44:54, 1.06it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 23%|██████████████████▏ | 838/3690 [14:12<44:13, 1.07it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 23%|██████████████████▏ | 840/3690 [14:14<47:43, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 23%|██████████████████▎ | 842/3690 [14:17<51:48, 1.09s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 23%|██████████████████▎ | 844/3690 [14:19<50:58, 1.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 23%|██████████████████▎ | 846/3690 [14:21<49:23, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 23%|██████████████████▍ | 848/3690 [14:23<46:22, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 23%|██████████████████▍ | 850/3690 [14:25<48:38, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 23%|██████████████████▍ | 852/3690 [14:27<47:05, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 23%|██████████████████▌ | 854/3690 [14:29<46:48, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 23%|██████████████████▌ | 856/3690 [14:31<48:10, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 23%|██████████████████▌ | 858/3690 [14:33<47:32, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 23%|██████████████████▋ | 860/3690 [14:35<47:53, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 23%|██████████████████▋ | 862/3690 [14:37<45:17, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 23%|██████████████████▋ | 864/3690 [14:39<45:51, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 23%|██████████████████▊ | 866/3690 [14:41<48:25, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 24%|██████████████████▊ | 868/3690 [14:43<46:46, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 24%|██████████████████▊ | 870/3690 [14:45<46:44, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 24%|██████████████████▉ | 873/3690 [14:48<49:23, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 24%|██████████████████▉ | 875/3690 [14:50<47:44, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 24%|███████████████████ | 877/3690 [14:52<46:32, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 24%|███████████████████ | 879/3690 [14:53<42:37, 1.10it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 24%|███████████████████ | 881/3690 [14:55<45:19, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 24%|███████████████████▏ | 883/3690 [14:57<46:01, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 24%|███████████████████▏ | 884/3690 [14:58<45:50, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 24%|███████████████████▏ | 886/3690 [15:01<49:29, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 24%|███████████████████▎ | 888/3690 [15:03<49:07, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 24%|███████████████████▎ | 890/3690 [15:05<47:13, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 24%|███████████████████▎ | 892/3690 [15:07<47:37, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 24%|███████████████████▍ | 894/3690 [15:09<45:05, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 24%|███████████████████▍ | 896/3690 [15:11<45:56, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 24%|███████████████████▍ | 898/3690 [15:13<42:59, 1.08it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 24%|███████████████████▌ | 900/3690 [15:15<49:47, 1.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 24%|███████████████████▌ | 902/3690 [15:17<51:52, 1.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 24%|███████████████████▌ | 904/3690 [15:19<48:43, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 25%|███████████████████▋ | 906/3690 [15:21<46:35, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 25%|███████████████████▋ | 908/3690 [15:23<46:59, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 25%|███████████████████▋ | 910/3690 [15:25<49:30, 1.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 25%|███████████████████▊ | 912/3690 [15:27<47:20, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 25%|███████████████████▊ | 914/3690 [15:29<44:53, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 25%|███████████████████▊ | 916/3690 [15:31<45:39, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 25%|███████████████████▉ | 918/3690 [15:33<46:02, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 25%|███████████████████▉ | 920/3690 [15:35<44:31, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 25%|███████████████████▉ | 922/3690 [15:37<47:32, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 25%|████████████████████ | 924/3690 [15:39<47:39, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 25%|████████████████████ | 926/3690 [15:41<46:17, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 25%|████████████████████ | 928/3690 [15:43<44:32, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 25%|████████████████████▏ | 930/3690 [15:45<47:16, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 25%|████████████████████▏ | 932/3690 [15:47<44:53, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 25%|████████████████████▏ | 934/3690 [15:49<43:21, 1.06it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 25%|████████████████████▎ | 936/3690 [15:51<45:50, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 25%|████████████████████▎ | 938/3690 [15:53<46:51, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 25%|████████████████████▍ | 940/3690 [15:56<48:31, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 26%|████████████████████▍ | 942/3690 [15:58<48:32, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 26%|████████████████████▍ | 944/3690 [16:00<47:30, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 26%|████████████████████▌ | 946/3690 [16:02<45:46, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 26%|████████████████████▌ | 948/3690 [16:04<45:14, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 26%|████████████████████▌ | 950/3690 [16:06<45:37, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 26%|████████████████████▋ | 952/3690 [16:08<46:49, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 26%|████████████████████▋ | 954/3690 [16:10<45:42, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 26%|████████████████████▋ | 956/3690 [16:12<46:47, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 26%|████████████████████▊ | 958/3690 [16:14<46:20, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 26%|████████████████████▊ | 960/3690 [16:16<45:39, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 26%|████████████████████▊ | 962/3690 [16:18<46:02, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 26%|████████████████████▉ | 964/3690 [16:20<41:07, 1.10it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 26%|████████████████████▉ | 966/3690 [16:21<41:49, 1.09it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 26%|████████████████████▉ | 968/3690 [16:23<41:11, 1.10it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 26%|█████████████████████ | 970/3690 [16:25<44:45, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 26%|█████████████████████ | 972/3690 [16:27<43:29, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 26%|█████████████████████ | 974/3690 [16:30<48:43, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 26%|█████████████████████▏ | 976/3690 [16:31<45:42, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 27%|█████████████████████▏ | 978/3690 [16:33<43:29, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 27%|█████████████████████▏ | 980/3690 [16:35<42:22, 1.07it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 27%|█████████████████████▎ | 982/3690 [16:37<43:49, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 27%|█████████████████████▎ | 984/3690 [16:39<42:52, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 27%|█████████████████████▍ | 986/3690 [16:41<45:45, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 27%|█████████████████████▍ | 989/3690 [16:44<44:09, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 27%|█████████████████████▍ | 991/3690 [16:46<43:37, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 27%|█████████████████████▌ | 993/3690 [16:48<41:21, 1.09it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 27%|█████████████████████▌ | 995/3690 [16:50<41:54, 1.07it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 27%|█████████████████████▌ | 997/3690 [16:52<41:35, 1.08it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 27%|█████████████████████▋ | 999/3690 [16:54<44:15, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 27%|█████████████████████▍ | 1001/3690 [16:55<43:14, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 27%|█████████████████████▍ | 1003/3690 [16:58<44:23, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 27%|█████████████████████▌ | 1005/3690 [17:00<49:16, 1.10s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 27%|█████████████████████▌ | 1007/3690 [17:02<46:50, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 27%|█████████████████████▌ | 1009/3690 [17:04<43:42, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 27%|█████████████████████▋ | 1011/3690 [17:06<46:04, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 27%|█████████████████████▋ | 1013/3690 [17:08<45:51, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 28%|█████████████████████▋ | 1015/3690 [17:10<43:14, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 28%|█████████████████████▊ | 1017/3690 [17:12<42:54, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 28%|█████████████████████▊ | 1019/3690 [17:14<44:57, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 28%|█████████████████████▊ | 1021/3690 [17:16<45:51, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 28%|█████████████████████▉ | 1023/3690 [17:18<45:18, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 28%|█████████████████████▉ | 1025/3690 [17:20<46:47, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 28%|█████████████████████▉ | 1027/3690 [17:22<45:24, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 28%|██████████████████████ | 1029/3690 [17:24<45:16, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 28%|██████████████████████ | 1030/3690 [17:25<47:40, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 28%|██████████████████████ | 1032/3690 [17:27<46:21, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 28%|██████████████████████▏ | 1034/3690 [17:29<45:33, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 28%|██████████████████████▏ | 1036/3690 [17:31<43:21, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 28%|██████████████████████▏ | 1039/3690 [17:34<42:47, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 28%|██████████████████████▎ | 1041/3690 [17:36<43:15, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 28%|██████████████████████▎ | 1043/3690 [17:38<41:20, 1.07it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 28%|██████████████████████▎ | 1045/3690 [17:40<44:04, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 28%|██████████████████████▍ | 1047/3690 [17:42<44:59, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 28%|██████████████████████▍ | 1049/3690 [17:44<43:21, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 28%|██████████████████████▌ | 1051/3690 [17:46<42:10, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 29%|██████████████████████▌ | 1053/3690 [17:48<42:48, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 29%|██████████████████████▌ | 1055/3690 [17:50<43:02, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 29%|██████████████████████▋ | 1057/3690 [17:52<45:31, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 29%|██████████████████████▋ | 1059/3690 [17:54<43:45, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 29%|██████████████████████▋ | 1061/3690 [17:56<43:31, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 29%|██████████████████████▊ | 1063/3690 [17:58<43:39, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 29%|██████████████████████▊ | 1065/3690 [18:00<42:53, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 29%|██████████████████████▊ | 1066/3690 [18:01<45:38, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 29%|██████████████████████▊ | 1068/3690 [18:03<46:20, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 29%|██████████████████████▉ | 1071/3690 [18:06<42:49, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 29%|██████████████████████▉ | 1072/3690 [18:07<44:49, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 29%|██████████████████████▉ | 1074/3690 [18:09<45:39, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 29%|███████████████████████ | 1076/3690 [18:12<46:04, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 29%|███████████████████████ | 1078/3690 [18:14<47:57, 1.10s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 29%|███████████████████████ | 1080/3690 [18:16<44:35, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 29%|███████████████████████▏ | 1082/3690 [18:18<45:53, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 29%|███████████████████████▏ | 1084/3690 [18:20<43:18, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 29%|███████████████████████▎ | 1086/3690 [18:22<41:41, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 29%|███████████████████████▎ | 1088/3690 [18:24<42:53, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 30%|███████████████████████▎ | 1090/3690 [18:26<44:54, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 30%|███████████████████████▍ | 1092/3690 [18:28<42:12, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 30%|███████████████████████▍ | 1094/3690 [18:30<41:09, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 30%|███████████████████████▍ | 1096/3690 [18:32<42:46, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 30%|███████████████████████▌ | 1098/3690 [18:34<46:00, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 30%|███████████████████████▌ | 1101/3690 [18:37<44:14, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 30%|███████████████████████▌ | 1102/3690 [18:38<45:19, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 30%|███████████████████████▋ | 1104/3690 [18:40<44:19, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 30%|███████████████████████▋ | 1106/3690 [18:42<43:17, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 30%|███████████████████████▋ | 1108/3690 [18:44<42:32, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 30%|███████████████████████▊ | 1110/3690 [18:46<44:24, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 30%|███████████████████████▊ | 1113/3690 [18:49<42:35, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 30%|███████████████████████▊ | 1115/3690 [18:51<41:26, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 30%|███████████████████████▉ | 1117/3690 [18:53<43:51, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 30%|███████████████████████▉ | 1119/3690 [18:55<42:51, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 30%|███████████████████████▉ | 1121/3690 [18:57<43:04, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 30%|████████████████████████ | 1123/3690 [18:59<43:52, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 30%|████████████████████████ | 1125/3690 [19:01<42:40, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 31%|████████████████████████▏ | 1127/3690 [19:03<42:58, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 31%|████████████████████████▏ | 1129/3690 [19:05<39:55, 1.07it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 31%|████████████████████████▏ | 1131/3690 [19:07<39:13, 1.09it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 31%|████████████████████████▎ | 1133/3690 [19:09<39:14, 1.09it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 31%|████████████████████████▎ | 1135/3690 [19:11<43:03, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 31%|████████████████████████▎ | 1137/3690 [19:13<45:22, 1.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 31%|████████████████████████▍ | 1139/3690 [19:15<46:12, 1.09s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 31%|████████████████████████▍ | 1141/3690 [19:17<44:56, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 31%|████████████████████████▍ | 1143/3690 [19:19<43:02, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 31%|████████████████████████▌ | 1145/3690 [19:21<42:25, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 31%|████████████████████████▌ | 1147/3690 [19:23<43:15, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 31%|████████████████████████▌ | 1149/3690 [19:25<42:36, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 31%|████████████████████████▋ | 1151/3690 [19:27<42:31, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 31%|████████████████████████▋ | 1152/3690 [19:28<42:28, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 31%|████████████████████████▋ | 1154/3690 [19:30<43:22, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 31%|████████████████████████▋ | 1156/3690 [19:32<43:48, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 31%|████████████████████████▊ | 1158/3690 [19:35<45:42, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 31%|████████████████████████▊ | 1160/3690 [19:37<44:46, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 31%|████████████████████████▉ | 1162/3690 [19:39<40:58, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|████████████████████████▉ | 1165/3690 [19:41<38:11, 1.10it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|████████████████████████▉ | 1167/3690 [19:43<38:42, 1.09it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████ | 1169/3690 [19:45<40:03, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████ | 1171/3690 [19:47<40:06, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████ | 1173/3690 [19:48<37:24, 1.12it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▏ | 1175/3690 [19:50<39:48, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▏ | 1177/3690 [19:52<40:47, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▏ | 1179/3690 [19:55<41:43, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▎ | 1181/3690 [19:56<41:34, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▎ | 1183/3690 [19:59<42:25, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▎ | 1185/3690 [20:01<43:16, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▍ | 1187/3690 [20:03<41:39, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▍ | 1189/3690 [20:05<41:32, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▍ | 1191/3690 [20:07<42:23, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▌ | 1193/3690 [20:09<42:23, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▌ | 1195/3690 [20:11<41:03, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▋ | 1197/3690 [20:13<44:03, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▋ | 1199/3690 [20:15<43:50, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 33%|█████████████████████████▋ | 1201/3690 [20:17<44:39, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 33%|█████████████████████████▊ | 1203/3690 [20:19<43:09, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 33%|█████████████████████████▊ | 1205/3690 [20:22<44:23, 1.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 33%|█████████████████████████▊ | 1207/3690 [20:23<41:09, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 33%|█████████████████████████▉ | 1209/3690 [20:25<40:43, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 33%|█████████████████████████▉ | 1211/3690 [20:27<42:26, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 33%|█████████████████████████▉ | 1213/3690 [20:29<41:22, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 33%|██████████████████████████ | 1215/3690 [20:32<42:35, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 33%|██████████████████████████ | 1217/3690 [20:33<39:53, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 33%|██████████████████████████ | 1219/3690 [20:35<41:05, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 33%|██████████████████████████▏ | 1221/3690 [20:38<42:23, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 33%|██████████████████████████▏ | 1223/3690 [20:39<40:46, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 33%|██████████████████████████▏ | 1225/3690 [20:42<42:35, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 33%|██████████████████████████▎ | 1227/3690 [20:44<42:51, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 33%|██████████████████████████▎ | 1228/3690 [20:45<42:31, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 33%|██████████████████████████▎ | 1230/3690 [20:47<43:42, 1.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 33%|██████████████████████████▍ | 1232/3690 [20:49<42:48, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 33%|██████████████████████████▍ | 1234/3690 [20:51<43:41, 1.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 33%|██████████████████████████▍ | 1236/3690 [20:53<44:02, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 34%|██████████████████████████▌ | 1238/3690 [20:55<41:17, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 34%|██████████████████████████▌ | 1240/3690 [20:57<38:30, 1.06it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 34%|██████████████████████████▌ | 1242/3690 [20:59<42:08, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 34%|██████████████████████████▋ | 1244/3690 [21:01<41:44, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 34%|██████████████████████████▋ | 1246/3690 [21:03<41:37, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 34%|██████████████████████████▋ | 1248/3690 [21:05<42:06, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 34%|██████████████████████████▊ | 1250/3690 [21:07<43:58, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 34%|██████████████████████████▊ | 1252/3690 [21:10<42:22, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 34%|██████████████████████████▊ | 1253/3690 [21:11<43:51, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 34%|██████████████████████████▊ | 1255/3690 [21:13<43:12, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 34%|██████████████████████████▉ | 1257/3690 [21:15<43:01, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 34%|██████████████████████████▉ | 1259/3690 [21:17<39:51, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 34%|██████████████████████████▉ | 1261/3690 [21:19<39:11, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 34%|███████████████████████████ | 1263/3690 [21:21<42:51, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 34%|███████████████████████████ | 1265/3690 [21:23<44:40, 1.11s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 34%|███████████████████████████▏ | 1267/3690 [21:25<43:09, 1.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 34%|███████████████████████████▏ | 1269/3690 [21:27<42:47, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 34%|███████████████████████████▏ | 1271/3690 [21:30<43:01, 1.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 34%|███████████████████████████▎ | 1273/3690 [21:32<42:03, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 35%|███████████████████████████▎ | 1275/3690 [21:34<42:12, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 35%|███████████████████████████▎ | 1277/3690 [21:36<41:34, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 35%|███████████████████████████▍ | 1279/3690 [21:38<41:42, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 35%|███████████████████████████▍ | 1281/3690 [21:40<41:03, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 35%|███████████████████████████▍ | 1283/3690 [21:42<40:18, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 35%|███████████████████████████▌ | 1285/3690 [21:44<39:06, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 35%|███████████████████████████▌ | 1287/3690 [21:46<40:22, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 35%|███████████████████████████▌ | 1288/3690 [21:47<40:56, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 35%|███████████████████████████▌ | 1290/3690 [21:49<44:33, 1.11s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 35%|███████████████████████████▋ | 1292/3690 [21:51<43:39, 1.09s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 35%|███████████████████████████▋ | 1294/3690 [21:53<41:25, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 35%|███████████████████████████▋ | 1296/3690 [21:55<38:57, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 35%|███████████████████████████▊ | 1298/3690 [21:57<39:28, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 35%|███████████████████████████▊ | 1300/3690 [21:59<38:13, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 35%|███████████████████████████▊ | 1302/3690 [22:01<38:49, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 35%|███████████████████████████▉ | 1304/3690 [22:03<39:12, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 35%|███████████████████████████▉ | 1306/3690 [22:05<40:35, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 35%|████████████████████████████ | 1308/3690 [22:07<40:29, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 36%|████████████████████████████ | 1310/3690 [22:09<42:08, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 36%|████████████████████████████ | 1312/3690 [22:11<41:11, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 36%|████████████████████████████▏ | 1314/3690 [22:13<39:22, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 36%|████████████████████████████▏ | 1316/3690 [22:16<41:09, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 36%|████████████████████████████▏ | 1318/3690 [22:18<39:58, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 36%|████████████████████████████▎ | 1320/3690 [22:20<41:24, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 36%|████████████████████████████▎ | 1322/3690 [22:22<40:45, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 36%|████████████████████████████▎ | 1324/3690 [22:24<39:43, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 36%|████████████████████████████▍ | 1326/3690 [22:26<40:17, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 36%|████████████████████████████▍ | 1328/3690 [22:28<42:22, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 36%|████████████████████████████▍ | 1330/3690 [22:30<40:40, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 36%|████████████████████████████▌ | 1332/3690 [22:32<38:35, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 36%|████████████████████████████▌ | 1334/3690 [22:34<39:34, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 36%|████████████████████████████▌ | 1336/3690 [22:36<39:03, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 36%|████████████████████████████▋ | 1338/3690 [22:38<39:36, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 36%|████████████████████████████▋ | 1340/3690 [22:40<39:36, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 36%|████████████████████████████▋ | 1342/3690 [22:42<38:53, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 36%|████████████████████████████▊ | 1344/3690 [22:44<37:49, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 36%|████████████████████████████▊ | 1346/3690 [22:46<39:21, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 37%|████████████████████████████▊ | 1348/3690 [22:48<38:59, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 37%|████████████████████████████▉ | 1350/3690 [22:50<38:07, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 37%|████████████████████████████▉ | 1352/3690 [22:52<39:34, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 37%|████████████████████████████▉ | 1354/3690 [22:54<37:31, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 37%|█████████████████████████████ | 1356/3690 [22:56<37:03, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 37%|█████████████████████████████ | 1358/3690 [22:58<38:43, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 37%|█████████████████████████████ | 1360/3690 [23:00<39:19, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 37%|█████████████████████████████▏ | 1362/3690 [23:02<42:17, 1.09s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 37%|█████████████████████████████▏ | 1364/3690 [23:04<40:00, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 37%|█████████████████████████████▏ | 1366/3690 [23:06<38:48, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 37%|█████████████████████████████▎ | 1368/3690 [23:08<39:33, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 37%|█████████████████████████████▎ | 1369/3690 [23:09<40:37, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 37%|█████████████████████████████▎ | 1372/3690 [23:12<37:27, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 37%|█████████████████████████████▍ | 1374/3690 [23:14<38:22, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 37%|█████████████████████████████▍ | 1376/3690 [23:16<37:41, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 37%|█████████████████████████████▌ | 1378/3690 [23:18<37:03, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 37%|█████████████████████████████▌ | 1380/3690 [23:20<37:15, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 37%|█████████████████████████████▌ | 1382/3690 [23:22<37:23, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|█████████████████████████████▋ | 1384/3690 [23:24<39:13, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|█████████████████████████████▋ | 1386/3690 [23:26<39:48, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|█████████████████████████████▋ | 1388/3690 [23:28<41:22, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|█████████████████████████████▊ | 1390/3690 [23:30<40:04, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|█████████████████████████████▊ | 1391/3690 [23:31<39:22, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|█████████████████████████████▊ | 1393/3690 [23:33<42:02, 1.10s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|█████████████████████████████▊ | 1395/3690 [23:35<40:04, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|█████████████████████████████▉ | 1398/3690 [23:38<37:52, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|█████████████████████████████▉ | 1399/3690 [23:39<38:04, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████ | 1402/3690 [23:42<37:57, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████ | 1404/3690 [23:44<38:26, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████ | 1406/3690 [23:46<37:24, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 1408/3690 [23:48<37:41, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 1410/3690 [23:50<38:43, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 1412/3690 [23:52<37:50, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▎ | 1413/3690 [23:53<38:45, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▎ | 1415/3690 [23:56<42:00, 1.11s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▎ | 1417/3690 [23:58<38:48, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▍ | 1419/3690 [24:00<39:34, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 39%|██████████████████████████████▍ | 1421/3690 [24:02<38:56, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 39%|██████████████████████████████▍ | 1423/3690 [24:04<37:22, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 39%|█████████████████��████████████▌ | 1425/3690 [24:06<38:00, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 39%|██████████████████████████████▌ | 1427/3690 [24:08<38:45, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 39%|██████████████████████████████▌ | 1429/3690 [24:10<37:36, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 39%|██████████████████████████████▋ | 1431/3690 [24:12<39:05, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 39%|██████████████████████████████▋ | 1433/3690 [24:14<37:09, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 39%|██████████████████████████████▋ | 1435/3690 [24:16<37:02, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 39%|██████████████████████████████▊ | 1437/3690 [24:18<36:45, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 39%|██████████████████████████████▊ | 1439/3690 [24:20<38:32, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 39%|██████████████████████████████▊ | 1442/3690 [24:22<33:41, 1.11it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 39%|██████████████████████████████▉ | 1444/3690 [24:24<35:53, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 39%|██████████████████████████████▉ | 1446/3690 [24:26<37:14, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 39%|███████████████████████████████ | 1448/3690 [24:28<37:55, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 39%|███████████████████████████████ | 1450/3690 [24:30<37:58, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 39%|███████████████████████████████ | 1452/3690 [24:33<38:13, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 39%|███████████████████████████████ | 1453/3690 [24:34<38:10, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 39%|███████████████████████████████▏ | 1455/3690 [24:36<39:18, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 39%|███████████████████████████████▏ | 1457/3690 [24:38<39:37, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 40%|███████████████████████████████▏ | 1459/3690 [24:40<41:50, 1.13s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 40%|███████████████████████████████▎ | 1461/3690 [24:42<40:50, 1.10s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 40%|███████████████████████████████▎ | 1463/3690 [24:44<38:06, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 40%|███████████████████████████████▎ | 1465/3690 [24:46<37:25, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 40%|███████████████████████████████▍ | 1467/3690 [24:48<37:22, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 40%|███████████████████████████████▍ | 1469/3690 [24:50<37:56, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 40%|███████████████████████████████▍ | 1471/3690 [24:53<37:54, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 40%|███████████████████████████████▌ | 1473/3690 [24:54<36:39, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 40%|███████████████████████████████▌ | 1475/3690 [24:56<35:35, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 40%|███████████████████████████████▌ | 1477/3690 [24:58<36:26, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 40%|███████████████████████████████▋ | 1479/3690 [25:01<38:28, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 40%|███████████████████████████████▋ | 1481/3690 [25:03<38:41, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 40%|███████████████████████████████▋ | 1482/3690 [25:04<39:44, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 40%|███████████████████████████████▊ | 1484/3690 [25:06<38:31, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 40%|███████████████████████████████▊ | 1486/3690 [25:08<39:28, 1.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 40%|███████████████████████████████▊ | 1488/3690 [25:10<38:15, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 40%|███████████████████████████████▉ | 1490/3690 [25:12<37:49, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 40%|███████████████████████████████▉ | 1492/3690 [25:14<36:04, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 40%|███████████████████████████████▉ | 1494/3690 [25:16<35:46, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 41%|████████████████████████████████ | 1496/3690 [25:18<36:02, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 41%|████████████████████████████████ | 1498/3690 [25:20<37:50, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 41%|████████████████████████████████ | 1500/3690 [25:22<37:40, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 41%|████████████████████████████████▏ | 1502/3690 [25:24<37:15, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 41%|████████████████████████████████▏ | 1504/3690 [25:26<38:21, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 41%|████████████████████████████████▏ | 1506/3690 [25:28<37:07, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 41%|████████████████████████████████▎ | 1508/3690 [25:30<36:30, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 41%|████████████████████████████████▎ | 1510/3690 [25:32<36:20, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 41%|████████████████████████████████▎ | 1512/3690 [25:35<39:03, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 41%|████████████████████████████████▍ | 1514/3690 [25:37<36:46, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 41%|████████████████████████████████▍ | 1516/3690 [25:39<36:57, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 41%|████████████████████████████████▍ | 1518/3690 [25:41<37:31, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 41%|████████████████████████████████▌ | 1520/3690 [25:43<38:01, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 41%|████████████████████████████████▌ | 1522/3690 [25:45<38:20, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 41%|████████████████████████████████▌ | 1523/3690 [25:46<39:54, 1.11s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 41%|████████████████████████████████▋ | 1525/3690 [25:48<40:37, 1.13s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 41%|████████████████████████████████▋ | 1527/3690 [25:51<39:04, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 41%|████████████████████████████████▊ | 1530/3690 [25:54<37:26, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 41%|████████████████████████████████▊ | 1531/3690 [25:55<39:30, 1.10s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 42%|████████████████████████████████▊ | 1533/3690 [25:57<40:07, 1.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 42%|████████████████████████████████▊ | 1535/3690 [25:59<38:40, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 42%|████████████████████████████████▉ | 1537/3690 [26:01<35:27, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 42%|████████████████████████████████▉ | 1539/3690 [26:03<35:49, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 42%|████████████████████████████████▉ | 1541/3690 [26:05<35:00, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 42%|█████████████████████████████████ | 1544/3690 [26:08<33:05, 1.08it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 42%|█████████████████████████████████ | 1545/3690 [26:09<34:36, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 42%|█████████████████████████████████ | 1547/3690 [26:11<34:34, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 42%|█████████████████████████████████▏ | 1549/3690 [26:13<37:08, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 42%|█████████████████████████████████▏ | 1551/3690 [26:15<36:39, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 42%|█████████████████████████████████▏ | 1553/3690 [26:17<35:49, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 42%|█████████████████████████████████▎ | 1555/3690 [26:19<36:36, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 42%|█████████████████████████████████▎ | 1557/3690 [26:21<35:04, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 42%|█████████████████████████████████▍ | 1559/3690 [26:23<35:04, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 42%|█████████████████████████████████▍ | 1561/3690 [26:25<37:05, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 42%|█████████████████████████████████▍ | 1563/3690 [26:27<36:38, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 42%|█████████████████████████████████▌ | 1565/3690 [26:29<36:26, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 42%|█████████████████████████████████▌ | 1567/3690 [26:31<34:39, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 43%|█████████████████████████████████▌ | 1569/3690 [26:33<37:51, 1.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 43%|█████████████████████████████████▋ | 1571/3690 [26:35<36:20, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 43%|█████████████████████████████████▋ | 1573/3690 [26:37<36:46, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 43%|█████████████████████████████████▋ | 1575/3690 [26:39<35:22, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 43%|█████████████████████████████████▊ | 1577/3690 [26:41<37:09, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 43%|█████████████████████████████████▊ | 1579/3690 [26:44<38:19, 1.09s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 43%|█████████████████████████████████▊ | 1581/3690 [26:46<38:34, 1.10s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 43%|█████████████████████████████████▊ | 1582/3690 [26:47<38:22, 1.09s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 43%|█████████████████████████████████▉ | 1585/3690 [26:50<36:06, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 43%|█████████████████████████████████▉ | 1587/3690 [26:52<33:23, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 43%|█████████████████████████████████▉ | 1588/3690 [26:53<36:04, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 43%|██████████████████████████████████ | 1590/3690 [26:55<38:37, 1.10s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 43%|██████████████████████████████████ | 1592/3690 [26:57<37:29, 1.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 43%|██████████████████████████████████▏ | 1594/3690 [26:59<35:14, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 43%|██████████████████████████████████▏ | 1596/3690 [27:01<36:01, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 43%|██████████████████████████████████▏ | 1598/3690 [27:03<35:14, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 43%|██████████████████████████████████▎ | 1600/3690 [27:06<35:58, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 43%|██████████████████████████████████▎ | 1602/3690 [27:08<36:09, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 43%|██████████████████████████████████▎ | 1604/3690 [27:10<35:58, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 44%|██████████████████████████████████▍ | 1606/3690 [27:12<35:36, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 44%|██████████████████████████████████▍ | 1607/3690 [27:13<36:10, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 44%|██████████████████████████████████▍ | 1609/3690 [27:15<36:14, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 44%|██████████████████████████████████▍ | 1611/3690 [27:17<36:45, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 44%|██████████████████████████████████▌ | 1613/3690 [27:19<36:12, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 44%|██████████████████████████████████▌ | 1615/3690 [27:21<34:53, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 44%|██████████████████████████████████▌ | 1617/3690 [27:23<34:01, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 44%|██████████████████████████████████▋ | 1619/3690 [27:25<35:32, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 44%|██████████████████████████████████▋ | 1621/3690 [27:27<35:27, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 44%|██████████████████████████████████▋ | 1623/3690 [27:29<33:49, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 44%|██████████████████████████████████▊ | 1625/3690 [27:31<34:21, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 44%|██████████████████████████████████▊ | 1627/3690 [27:33<33:14, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 44%|██████████████████████████████████▉ | 1629/3690 [27:35<32:49, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 44%|██████████████████████████████████▉ | 1631/3690 [27:37<36:14, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 44%|██████████████████████████████████▉ | 1633/3690 [27:39<35:07, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 44%|███████████████████████████████████ | 1635/3690 [27:41<36:02, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 44%|███████████████████████████████████ | 1637/3690 [27:43<35:13, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 44%|███████████████████████████████████ | 1639/3690 [27:45<35:44, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 44%|███████████████████████████████████▏ | 1641/3690 [27:47<32:36, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 45%|███████████████████████████████████▏ | 1643/3690 [27:49<33:58, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 45%|███████████████████████████████████▏ | 1645/3690 [27:51<33:13, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 45%|███████████████████████████████████▎ | 1647/3690 [27:53<33:59, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 45%|███████████████████████████████████▎ | 1650/3690 [27:56<32:17, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 45%|███████████████████████████████████▎ | 1652/3690 [27:58<33:20, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 45%|███████████████████████████████████▍ | 1654/3690 [28:00<34:15, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 45%|███████████████████████████████████▍ | 1655/3690 [28:01<34:48, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 45%|███████████████████████████████████▍ | 1657/3690 [28:03<33:48, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 45%|███████████████████████████████████▌ | 1660/3690 [28:06<33:43, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 45%|███████████████████████████████████▌ | 1662/3690 [28:08<33:26, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 45%|███████████████████████████████████▌ | 1664/3690 [28:10<32:39, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 45%|███████████████████████████████████▋ | 1666/3690 [28:12<32:50, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 45%|███████████████████████████████████▋ | 1668/3690 [28:14<35:30, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 45%|███████████████████████████████████▋ | 1669/3690 [28:15<35:26, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 45%|███████████████████████████████████▊ | 1672/3690 [28:18<33:53, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 45%|███████████████████████████████████▊ | 1673/3690 [28:19<35:25, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 45%|███████████████████████████████████▊ | 1675/3690 [28:21<34:54, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 45%|███████████████████████████████████▉ | 1677/3690 [28:23<34:25, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 46%|███████████████████████████████████▉ | 1679/3690 [28:25<34:06, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 46%|███████████████████████████████████▉ | 1681/3690 [28:28<35:19, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 46%|████████████████████████████████████ | 1683/3690 [28:30<34:29, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 46%|████████████████████████████████████ | 1685/3690 [28:32<33:46, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 46%|████████████████████████████████████ | 1687/3690 [28:34<35:45, 1.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 46%|████████████████████████████████████▏ | 1689/3690 [28:36<33:01, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 46%|█████��██████████████████████████████▏ | 1691/3690 [28:38<35:05, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 46%|████████████████████████████████████▏ | 1693/3690 [28:40<34:03, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 46%|████████████████████████████████████▎ | 1695/3690 [28:42<33:58, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 46%|████████████████████████████████████▎ | 1697/3690 [28:44<33:55, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 46%|████████████████████████████████████▎ | 1699/3690 [28:46<32:09, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 46%|████████████████████████████████████▍ | 1701/3690 [28:48<32:40, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 46%|████████████████████████████████████▍ | 1703/3690 [28:50<33:53, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 46%|████████████████████████████████████▌ | 1705/3690 [28:52<33:48, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 46%|████████████████████████████████████▌ | 1707/3690 [28:54<33:47, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 46%|████████████████████████████████████▌ | 1709/3690 [28:56<34:14, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 46%|████████████████████████████████████▋ | 1711/3690 [28:58<32:16, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 46%|██████████████████���█████████████████▋ | 1713/3690 [29:00<32:31, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 46%|████████████████████████████████████▋ | 1715/3690 [29:02<32:00, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 47%|████████████████████████████████████▊ | 1717/3690 [29:04<30:51, 1.07it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 47%|████████████████████████████████████▊ | 1719/3690 [29:06<31:12, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 47%|████████████████████████████████████▊ | 1721/3690 [29:08<31:23, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 47%|████████████████████████████████████▉ | 1723/3690 [29:10<31:41, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 47%|████████████████████████████████████▉ | 1726/3690 [29:13<32:07, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 47%|████████████████████████████████████▉ | 1728/3690 [29:15<33:15, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 47%|█████████████████████████████████████ | 1730/3690 [29:17<32:59, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 47%|█████████████████████████████████████ | 1732/3690 [29:19<31:46, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 47%|█████████████████████████████████████ | 1734/3690 [29:21<33:04, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 47%|█████████████████████████████████████▏ | 1736/3690 [29:23<32:13, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 47%|█████████████████████████████████████▏ | 1738/3690 [29:25<33:23, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 47%|█████████████████████████████████████▎ | 1740/3690 [29:27<32:58, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 47%|█████████████████████████████████████▎ | 1742/3690 [29:29<31:39, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 47%|█████████████████████████████████████▎ | 1744/3690 [29:31<31:17, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 47%|█████████████████████████████████████▍ | 1746/3690 [29:33<32:37, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 47%|█████████████████████████████████████▍ | 1748/3690 [29:35<32:54, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 47%|█████████████████████████████████████▍ | 1750/3690 [29:37<33:07, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 47%|█████████████████████████████████████▍ | 1751/3690 [29:38<34:01, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 48%|█████████████████████████████████████▌ | 1753/3690 [29:40<33:57, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 48%|█████████████████████████████████████▌ | 1755/3690 [29:42<32:16, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 48%|████████████████████████████████████���▌ | 1757/3690 [29:44<32:54, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 48%|█████████████████████████████████████▋ | 1759/3690 [29:46<33:02, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 48%|█████████████████████████████████████▋ | 1761/3690 [29:48<34:57, 1.09s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 48%|█████████████████████████████████████▋ | 1763/3690 [29:50<33:12, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 48%|█████████████████████████████████████▊ | 1765/3690 [29:53<33:35, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 48%|█████████████████████████████████████▊ | 1767/3690 [29:55<33:40, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 48%|█████████████████████████████████████▊ | 1769/3690 [29:57<32:18, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 48%|█████████████████████████████████████▉ | 1771/3690 [29:59<32:04, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 48%|█████████████████████████████████████▉ | 1773/3690 [30:01<32:04, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 48%|██████████████████████████████████████ | 1775/3690 [30:03<32:27, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 48%|██████████████████████████████████████ | 1777/3690 [30:05<31:52, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 48%|██████████████████████████████████████ | 1779/3690 [30:07<33:07, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 48%|██████████████████████████████████████ | 1780/3690 [30:08<34:30, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 48%|██████████████████████████████████████▏ | 1783/3690 [30:11<31:17, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 48%|██████████████████████████████████████▏ | 1785/3690 [30:13<31:05, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 48%|██████████████████████████████████████▏ | 1786/3690 [30:14<31:47, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 48%|██████████████████████████████████████▎ | 1788/3690 [30:16<31:07, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 49%|██████████████████████████████████████▎ | 1790/3690 [30:18<32:09, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 49%|██████████████████████████████████████▎ | 1792/3690 [30:20<31:38, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 49%|██████████████████████████████████████▍ | 1794/3690 [30:22<32:12, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 49%|██████████████████████████████████████▍ | 1796/3690 [30:24<33:03, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 49%|██████████████████████████████████████▍ | 1798/3690 [30:26<32:59, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 49%|██████████████████████████████████████▌ | 1800/3690 [30:28<33:06, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 49%|██████████████████████████████████████▌ | 1802/3690 [30:30<31:37, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 49%|██████████████████████████████████████▌ | 1804/3690 [30:32<31:41, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 49%|██████████████████████████████████████▋ | 1806/3690 [30:34<30:54, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 49%|██████████████████████████████████████▋ | 1808/3690 [30:36<30:28, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 49%|██████████████████████████████████████▊ | 1810/3690 [30:38<32:14, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 49%|██████████████████████████████████████▊ | 1812/3690 [30:41<34:24, 1.10s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 49%|██████████████████████████████████████▊ | 1814/3690 [30:43<32:21, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 49%|██████████████████████████████████████▉ | 1816/3690 [30:45<32:09, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 49%|██████████████████████████████████████▉ | 1818/3690 [30:47<33:35, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 49%|██████████████████████████████████████▉ | 1819/3690 [30:48<33:16, 1.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 49%|██████████████████████████████████████▉ | 1821/3690 [30:50<35:24, 1.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 49%|███████████████████████████████████████ | 1823/3690 [30:52<33:45, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 49%|███████████████████████████████████████ | 1825/3690 [30:55<33:44, 1.09s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 50%|███████████████████████████████████████ | 1827/3690 [30:56<31:39, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 50%|███████████████████████████████████████▏ | 1829/3690 [30:58<31:08, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 50%|███████████████████████████████████████▏ | 1831/3690 [31:00<31:13, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 50%|███████████████████████████████████████▏ | 1833/3690 [31:03<31:55, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 50%|███████████████████████████████████████▎ | 1835/3690 [31:05<32:02, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 50%|███████████████████████████████████████▎ | 1837/3690 [31:07<32:37, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 50%|███████████████████████████████████████▎ | 1839/3690 [31:09<33:21, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 50%|███████████████████████████████████████▍ | 1841/3690 [31:11<33:34, 1.09s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 50%|████████████████████████████████���██████▍ | 1843/3690 [31:13<32:31, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 50%|███████████████████████████████████████▌ | 1845/3690 [31:15<31:39, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 50%|███████████████████████████████████████▌ | 1847/3690 [31:17<31:04, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 50%|███████████████████████████████████████▌ | 1849/3690 [31:19<31:12, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 50%|███████████████████████████████████████▋ | 1851/3690 [31:21<30:53, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 50%|███████████████████████████████████████▋ | 1852/3690 [31:22<32:32, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 50%|███████████████████████████████████████▋ | 1855/3690 [31:25<30:38, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 50%|███████████████████████████████████████▊ | 1857/3690 [31:27<29:41, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 50%|███████████████████████████████████████▊ | 1859/3690 [31:29<29:32, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 50%|███████████████████████████████████████▊ | 1861/3690 [31:31<31:23, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 50%|███████████████████████████████████████▉ | 1863/3690 [31:33<30:22, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 51%|███████████████████████████████████████▉ | 1865/3690 [31:35<30:50, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 51%|███████████████████████████████████████▉ | 1867/3690 [31:37<31:35, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 51%|███████████████████████████████████████▉ | 1868/3690 [31:39<32:20, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 51%|████████████████████████████████████████ | 1870/3690 [31:41<33:32, 1.11s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 51%|████████████████████████████████████████ | 1872/3690 [31:43<32:47, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 51%|████████████████████████████████████████ | 1874/3690 [31:45<30:46, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 51%|████████████████████████████████████████▏ | 1876/3690 [31:47<31:25, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 51%|████████████████████████████████████████▏ | 1878/3690 [31:49<28:49, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 51%|████████████████████████████████████████▏ | 1880/3690 [31:51<30:37, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 51%|████████████████████████████████████████▎ | 1882/3690 [31:53<30:07, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 51%|████████████████████████████████████████▎ | 1884/3690 [31:55<30:38, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 51%|██████████���█████████████████████████████▍ | 1886/3690 [31:57<29:32, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 51%|████████████████████████████████████████▍ | 1888/3690 [31:59<30:22, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 51%|████████████████████████████████████████▍ | 1890/3690 [32:01<30:09, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 51%|████████████████████████████████████████▌ | 1892/3690 [32:03<28:57, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 51%|████████████████████████████████████████▌ | 1894/3690 [32:05<29:49, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 51%|████████████████████████████████████████▌ | 1896/3690 [32:07<30:22, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 51%|████████████████████████████████████████▋ | 1898/3690 [32:09<29:47, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 51%|████████████████████████████████████████▋ | 1900/3690 [32:11<28:31, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 52%|████████████████████████████████████████▋ | 1902/3690 [32:13<31:50, 1.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 52%|████████████████████████████████████████▊ | 1904/3690 [32:15<30:14, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 52%|████████████████████████████████████████▊ | 1906/3690 [32:17<28:44, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 52%|████████████████████████████████████████▊ | 1908/3690 [32:19<30:27, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 52%|████████████████████████████████████████▉ | 1910/3690 [32:21<32:02, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 52%|████████████████████████████████████████▉ | 1912/3690 [32:23<31:16, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 52%|████████████████████████████████████████▉ | 1914/3690 [32:25<31:21, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 52%|█████████████████████████████████████████ | 1916/3690 [32:27<30:31, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 52%|█████████████████████████████████████████ | 1917/3690 [32:29<32:04, 1.09s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 52%|█████████████████████████████████████████ | 1920/3690 [32:32<30:48, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 52%|█████████████████████████████████████████▏ | 1921/3690 [32:33<31:35, 1.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 52%|█████████████████████████████████████████▏ | 1923/3690 [32:35<30:49, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 52%|█████████████████████████████████████████▏ | 1925/3690 [32:37<30:10, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 52%|█████████████████████████████████████████▎ | 1927/3690 [32:39<30:11, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 52%|█████████████████████████████████████████▎ | 1929/3690 [32:41<30:19, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 52%|█████████████████████████████████████████▎ | 1931/3690 [32:43<30:58, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 52%|█████████████████████████████████████████▍ | 1933/3690 [32:45<29:07, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 52%|█████████████████████████████████████████▍ | 1935/3690 [32:47<28:01, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 52%|█████████████████████████████████████████▍ | 1937/3690 [32:49<29:56, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 53%|█████████████████████████████████████████▌ | 1939/3690 [32:51<31:36, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 53%|█████████████████████████████████████████▌ | 1941/3690 [32:53<29:53, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 53%|█████████████████████████████████████████▌ | 1943/3690 [32:55<29:52, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 53%|█████████████████████████████████████████▋ | 1945/3690 [32:57<28:51, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 53%|█████████████████████████████████████████▋ | 1948/3690 [33:00<27:29, 1.06it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 53%|█████████████████████████████████████████▋ | 1950/3690 [33:02<27:37, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 53%|█████████████████████████████████████████▊ | 1952/3690 [33:04<28:45, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 53%|█████████████████████████████████████████▊ | 1954/3690 [33:06<25:45, 1.12it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 53%|█████████████████████████████████████████▉ | 1956/3690 [33:08<28:11, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 53%|█████████████████████████████████████████▉ | 1958/3690 [33:10<28:19, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 53%|█████████████████████████████████████████▉ | 1960/3690 [33:12<28:26, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 53%|██████████████████████████████████████████ | 1962/3690 [33:14<28:47, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 53%|██████████████████████████████████████████ | 1964/3690 [33:16<28:59, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 53%|██████████████████████████████████████████ | 1967/3690 [33:18<25:45, 1.11it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 53%|██████████████████████████████████████████▏ | 1968/3690 [33:19<28:22, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 53%|██████████████████████████████████████████▏ | 1970/3690 [33:21<28:32, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 53%|██████████████████████████████████████████▏ | 1972/3690 [33:23<28:57, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 54%|██████████████████████████████████████████▎ | 1975/3690 [33:26<27:19, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 54%|██████████████████████████████████████████▎ | 1976/3690 [33:27<28:34, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 54%|██████████████████████████████████████████▎ | 1979/3690 [33:30<26:19, 1.08it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 54%|██████████████████████████████████████████▍ | 1980/3690 [33:31<28:29, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 54%|██████████████████████████████████████████▍ | 1983/3690 [33:34<29:11, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 54%|██████████████████████████████████████████▍ | 1985/3690 [33:36<29:14, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 54%|██████████████████████████████████████████▌ | 1987/3690 [33:38<28:57, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 54%|██████████████████████████████████████████▌ | 1989/3690 [33:40<27:54, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 54%|██████████████████████████████████████████▋ | 1991/3690 [33:42<27:35, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 54%|██████████████████████████████████████████▋ | 1993/3690 [33:44<27:32, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 54%|██████████████████████████████████████████▋ | 1995/3690 [33:46<28:18, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 54%|██████████████████████████████████████████▊ | 1997/3690 [33:48<28:35, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 54%|██████████████████████████████████████████▊ | 1999/3690 [33:50<26:58, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 54%|██████████████████████████████████████████▊ | 2001/3690 [33:52<28:40, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 54%|██████████████████████████████████████████▉ | 2003/3690 [33:54<27:48, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 54%|██████████████████████████████████████████▉ | 2005/3690 [33:56<27:38, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 54%|██████████████████████████████████████████▉ | 2007/3690 [33:58<27:16, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 54%|███████████████████████████████████████████ | 2009/3690 [34:00<29:22, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 54%|███████████████████████████████████████████ | 2011/3690 [34:02<29:30, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 55%|███████████████████████████████████████████ | 2013/3690 [34:04<28:52, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 55%|███████████████████████████████████████████▏ | 2015/3690 [34:06<27:50, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 55%|███████████████████████████████████████████▏ | 2017/3690 [34:08<27:42, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 55%|███████████████████████████████████████████▏ | 2019/3690 [34:10<27:44, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 55%|███████████████████████████████████████████▎ | 2021/3690 [34:12<29:14, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 55%|███████████████████████████████████████████▎ | 2023/3690 [34:15<29:20, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 55%|███████████████████████████████████████████▎ | 2025/3690 [34:16<27:23, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 55%|███████████████████████████████████████████▍ | 2027/3690 [34:18<26:41, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 55%|███████████████████████████████████████████▍ | 2029/3690 [34:20<25:53, 1.07it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 55%|███████████████████████████████████████████▍ | 2031/3690 [34:22<26:05, 1.06it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 55%|███████████████████████████████████████████▌ | 2033/3690 [34:24<26:15, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 55%|███████████████████████████████████████████▌ | 2035/3690 [34:26<27:25, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 55%|███████████████████████████████████████████▌ | 2037/3690 [34:28<28:02, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 55%|███████████████████████████████████████████▋ | 2039/3690 [34:30<28:26, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 55%|███████████████████████████████████████████▋ | 2041/3690 [34:32<29:42, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 55%|███████████████████████████████████████████▋ | 2043/3690 [34:34<28:47, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 55%|███████████████████████████████████████████▊ | 2045/3690 [34:37<28:03, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 55%|███████████████████████████████████████████▊ | 2047/3690 [34:39<27:55, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 2049/3690 [34:41<27:45, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▉ | 2051/3690 [34:43<27:50, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▉ | 2053/3690 [34:44<25:11, 1.08it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▉ | 2055/3690 [34:46<24:47, 1.10it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|████████████████████████████████████████████ | 2057/3690 [34:48<25:18, 1.08it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|████████████████████████████████████████████ | 2059/3690 [34:50<26:00, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|████████████████████████████████████████████ | 2061/3690 [34:52<26:16, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|████████████████████████████████████████████▏ | 2063/3690 [34:54<25:58, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|████████████████████████████████████████████▏ | 2065/3690 [34:56<29:26, 1.09s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|████████████████████████████████████████████▎ | 2067/3690 [34:58<28:36, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|████████████████████████████████████████████▎ | 2069/3690 [35:00<27:01, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|████████████████████████████████████████████▎ | 2070/3690 [35:01<26:52, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|████████████████████████████████████████████▎ | 2072/3690 [35:03<27:08, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|████████████████████████████████████████████▍ | 2075/3690 [35:06<26:13, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|████████████████████████████████████████████▍ | 2077/3690 [35:08<26:09, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|████████████████████████████████████████████▌ | 2079/3690 [35:10<26:24, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|████████████████████████████████████████████▌ | 2081/3690 [35:12<25:42, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|████████████████████████████████████████████▌ | 2083/3690 [35:14<24:31, 1.09it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 57%|████████████████████████████████████████████▋ | 2085/3690 [35:16<26:09, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 57%|████████████████████████████████████████████▋ | 2087/3690 [35:18<26:21, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 57%|████████████████████████████████████████████▋ | 2089/3690 [35:20<27:20, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 57%|████████████████████████████████████████████▊ | 2091/3690 [35:22<27:47, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 57%|████████████████████████████████████████████▊ | 2093/3690 [35:24<28:35, 1.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 57%|████████████████████████████████████████████▊ | 2094/3690 [35:25<29:22, 1.10s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 57%|████████████████████████████████████████████▊ | 2096/3690 [35:28<30:56, 1.16s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 57%|████████████████████████████████████████████▉ | 2098/3690 [35:30<29:22, 1.11s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 57%|████████████████████████████████████████████▉ | 2100/3690 [35:32<27:46, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 57%|█████████��███████████████████████████████████ | 2102/3690 [35:34<25:40, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 57%|█████████████████████████████████████████████ | 2105/3690 [35:37<24:56, 1.06it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 57%|█████████████████████████████████████████████ | 2106/3690 [35:38<26:14, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 57%|█████████████████████████████████████████████▏ | 2108/3690 [35:40<27:46, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 57%|█████████████████████████████████████████████▏ | 2110/3690 [35:42<27:30, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 57%|█████████████████████████████████████████████▏ | 2112/3690 [35:44<25:37, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 57%|█████████████████████████████████████████████▎ | 2114/3690 [35:46<26:37, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 57%|█████████████████████████████████████████████▎ | 2116/3690 [35:48<26:11, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 57%|█████████████████████████████████████████████▎ | 2118/3690 [35:50<26:48, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 57%|█████████████████████████████████████████████▍ | 2120/3690 [35:52<25:12, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 58%|█████████████████████████████████████████████▍ | 2122/3690 [35:54<27:41, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 58%|█████████████████████████████████████████████▍ | 2124/3690 [35:56<26:24, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 58%|█████████████████████████████████████████████▌ | 2126/3690 [35:58<26:54, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 58%|█████████████████████████████████████████████▌ | 2128/3690 [36:00<28:10, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 58%|█████████████████████████████████████████████▌ | 2130/3690 [36:02<25:28, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 58%|█████████████████████████████████████████████▋ | 2132/3690 [36:04<25:35, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 58%|█████████████████████████████████████████████▋ | 2134/3690 [36:06<27:47, 1.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 58%|█████████████████████████████████████████████▋ | 2136/3690 [36:08<24:16, 1.07it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 58%|█████████████████████████████████████████████▊ | 2138/3690 [36:10<25:13, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 58%|█████████████████████████████████████████████▊ | 2140/3690 [36:12<24:48, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 58%|█████████████████████████████████████████████▊ | 2142/3690 [36:14<25:02, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 58%|█████████████████████████████████████████████▉ | 2144/3690 [36:16<25:31, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 58%|█████████████████████████████████████████████▉ | 2146/3690 [36:18<26:36, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 58%|█████████████████████████████████████████████▉ | 2148/3690 [36:20<26:49, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 58%|██████████████████████████████████████████████ | 2151/3690 [36:23<26:46, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 58%|██████████████████████████████████████████████ | 2153/3690 [36:25<25:25, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 58%|██████████████████████████████████████████████▏ | 2155/3690 [36:27<25:23, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 58%|██████████████████████████████████████████████▏ | 2157/3690 [36:29<24:45, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 59%|██████████████████████████████████████████████▏ | 2159/3690 [36:31<24:48, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 59%|██████████████████████████████████████████████▎ | 2161/3690 [36:33<25:36, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 59%|██████████████████████████████████████████████▎ | 2163/3690 [36:35<25:59, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 59%|██████████████████████████████████████████████▎ | 2164/3690 [36:36<27:25, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 59%|███████��██████████████████████████████████████▎ | 2166/3690 [36:38<26:39, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 59%|██████████████████████████████████████████████▍ | 2169/3690 [36:42<25:37, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 59%|██████████████████████████████████████████████▍ | 2170/3690 [36:43<25:33, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 59%|██████████████████████████████████████████████▌ | 2172/3690 [36:45<26:02, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 59%|██████████████████████████████████████████████▌ | 2174/3690 [36:47<27:11, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 59%|██████████████████████████████████████████████▌ | 2176/3690 [36:49<26:44, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 59%|██████████████████████████████████████████████▋ | 2178/3690 [36:51<26:20, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 59%|██████████████████████████████████████████████▋ | 2180/3690 [36:53<26:17, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 59%|██████████████████████████████████████████████▋ | 2182/3690 [36:55<26:04, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 59%|██████████████████████████████████████████████▊ | 2184/3690 [36:57<24:59, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 59%|██████████████████████████████████████████████▊ | 2186/3690 [36:59<26:07, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 59%|██████████████████████████████████████████████▊ | 2188/3690 [37:01<25:19, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 59%|██████████████████████████████████████████████▉ | 2190/3690 [37:03<24:19, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 59%|██████████████████████████████████████████████▉ | 2192/3690 [37:05<24:17, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 59%|██████████████████████████████████████████████▉ | 2194/3690 [37:07<24:43, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 60%|███████████████████████████████████████████████ | 2196/3690 [37:09<26:10, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 60%|███████████████████████████████████████████████ | 2198/3690 [37:11<25:49, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 60%|███████████████████████████████████████████████ | 2200/3690 [37:13<25:39, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 60%|███████████████████████████████████████████████▏ | 2202/3690 [37:15<25:12, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 60%|███████████████████████████████████████████████▏ | 2204/3690 [37:17<25:05, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 60%|███████████████████████████████████████████████▏ | 2206/3690 [37:19<25:37, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 60%|███████████████████████████████████████████████▎ | 2208/3690 [37:21<25:02, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 60%|███████████████████████████████████████████████▎ | 2210/3690 [37:23<24:47, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 60%|███████████████████████████████████████████████▎ | 2212/3690 [37:25<24:22, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 60%|███████████████████████████████████████████████▍ | 2214/3690 [37:27<24:38, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 60%|███████████████████████████████████████████████▍ | 2216/3690 [37:29<24:22, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 60%|███████████████████████████████████████████████▍ | 2218/3690 [37:31<23:17, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 60%|███████████████████████████████████████████████▌ | 2220/3690 [37:33<23:34, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 60%|███████████████████████████████████████████████▌ | 2222/3690 [37:35<23:26, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 60%|███████████████████████████████████████████████▌ | 2224/3690 [37:37<23:29, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 60%|███████████████████████████████████████████████▋ | 2226/3690 [37:39<24:17, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 60%|███████████████████████████████████████████████▋ | 2228/3690 [37:41<23:59, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 60%|███████████████████████████████████████████████▋ | 2230/3690 [37:43<23:28, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 60%|███████████████████████████████████████████████▊ | 2232/3690 [37:45<25:00, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 61%|███████████████████████████████████████████████▊ | 2234/3690 [37:47<25:04, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 61%|███████████████████████████████████████████████▊ | 2236/3690 [37:49<25:29, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 61%|███████████████████████████████████████████████▉ | 2238/3690 [37:51<25:01, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 61%|███████████████████████████████████████████████▉ | 2240/3690 [37:53<24:38, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 61%|███████████████████████████████████████████████▉ | 2242/3690 [37:55<23:30, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 61%|████████████████████████████████████████████████ | 2244/3690 [37:57<24:01, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 61%|████████████████████████████████████████████████ | 2246/3690 [37:59<24:03, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 61%|████████████████████████████████████████████████▏ | 2248/3690 [38:01<24:07, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 61%|████████████████████████████████████████████████▏ | 2250/3690 [38:03<24:27, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 61%|████████████████████████████████████████████████▏ | 2252/3690 [38:05<23:21, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 61%|████████████████████████████████████████████████▎ | 2254/3690 [38:07<22:29, 1.06it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 61%|████████████████████████████████████████████████▎ | 2256/3690 [38:09<23:43, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 61%|████████████████████████████████████████████████▎ | 2258/3690 [38:11<25:07, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 61%|████████████████████████████████████████████████▍ | 2260/3690 [38:13<22:41, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 61%|████████████████████████████████████████████████▍ | 2263/3690 [38:16<22:38, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 61%|████████████████████████████████████████████████▍ | 2265/3690 [38:18<23:25, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 61%|████████████████████████████████████████████████▌ | 2267/3690 [38:20<23:04, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 61%|████████████████████████████████████████████████▌ | 2269/3690 [38:22<23:33, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 62%|████████████████████████████████████████████████▌ | 2271/3690 [38:24<23:06, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 62%|████████████████████████████████████████████████▋ | 2273/3690 [38:26<22:03, 1.07it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 62%|████████████████████████████████████████████████▋ | 2275/3690 [38:27<22:33, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 62%|████████████████████████████████████████████████▋ | 2277/3690 [38:29<23:05, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 62%|████████████████████████████████████████████████▊ | 2279/3690 [38:31<22:25, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 62%|████████████████████████████████████████████████▊ | 2281/3690 [38:33<23:01, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 62%|████████████████████████████████████████████████▉ | 2283/3690 [38:35<22:55, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 62%|████████████████████████████████████████████████▉ | 2285/3690 [38:37<22:05, 1.06it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 62%|████████████████████████████████████████████████▉ | 2288/3690 [38:40<21:41, 1.08it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 62%|█████████████████████████████████████████████████ | 2290/3690 [38:42<21:44, 1.07it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 62%|█████████████████████████████████████████████████ | 2292/3690 [38:44<23:19, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 62%|█████████████████████████████████████████████████ | 2293/3690 [38:45<23:28, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 62%|█████████████████████████████████████████████████▏ | 2296/3690 [38:48<23:05, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 62%|█████████████████████████████████████████████████▏ | 2298/3690 [38:50<22:58, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 62%|█████████████████████████████████████████████████▏ | 2299/3690 [38:51<23:50, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 62%|█████████████████████████████████████████████████▎ | 2302/3690 [38:54<23:33, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 62%|█████████████████████████████████████████████████▎ | 2304/3690 [38:56<23:36, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 62%|█████████████████████████████████████████████████▎ | 2305/3690 [38:57<23:45, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 63%|█████████████████████████████████████████████████▍ | 2308/3690 [39:00<22:41, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 63%|█████████████████████████████████████████████████▍ | 2309/3690 [39:01<23:11, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 63%|█████████████████████████████████████████████████▍ | 2311/3690 [39:03<23:37, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 63%|█████████████████████████████████████████████████▌ | 2313/3690 [39:05<22:48, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 63%|█████████████████████████████████████████████████▌ | 2316/3690 [39:08<23:10, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 63%|█████████████████████████████████████████████████▋ | 2318/3690 [39:10<22:31, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 63%|█████████████████████████████████████████████████▋ | 2319/3690 [39:12<24:08, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 63%|█████████████████████████████████████████████████▋ | 2321/3690 [39:14<24:44, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 63%|█████████████████████████████████████████████████▋ | 2323/3690 [39:16<24:09, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 63%|█████████████████████████████████████████████████▊ | 2325/3690 [39:18<23:20, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 63%|█████████████████████████████████████████████████▊ | 2327/3690 [39:20<22:56, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 63%|█████████████████████████████████████████████████▊ | 2329/3690 [39:22<22:42, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 63%|█████████████████████████████████████████████████▉ | 2331/3690 [39:24<22:56, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 63%|█████████████████████████████████████████████████▉ | 2333/3690 [39:26<24:04, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 63%|█████████████████████████████████████████████████▉ | 2335/3690 [39:28<24:43, 1.10s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 63%|██████████████████████████████████████████████████ | 2337/3690 [39:30<23:28, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 63%|██████████████████████████████████████████████████ | 2338/3690 [39:32<24:12, 1.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 63%|██████████████████████████████████████████████████ | 2341/3690 [39:34<22:31, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 63%|██████████████████████████████████████████████████▏ | 2342/3690 [39:36<23:15, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 64%|██████████████████████████████████████████████████▏ | 2344/3690 [39:38<24:11, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 64%|██████████████████████████████████████████████████▏ | 2346/3690 [39:40<24:30, 1.09s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 64%|██████████████████████████████████████████████████▎ | 2348/3690 [39:42<23:33, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 64%|██████████████████████████████████████████████████▎ | 2350/3690 [39:44<22:21, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 64%|██████████████████████████████████████████████████▎ | 2352/3690 [39:46<22:45, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 64%|██████████████████████████████████████████████████▍ | 2354/3690 [39:48<22:56, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 64%|██████████████████████████████████████████████████▍ | 2356/3690 [39:50<21:17, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 64%|██████████████████████████████████████████████████▍ | 2358/3690 [39:52<21:31, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 64%|██████████████████████████████████████████████████▌ | 2360/3690 [39:54<21:54, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 64%|██████████████████████████████████████████████████▌ | 2362/3690 [39:56<21:34, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 64%|██████████████████████████████████████████████████▌ | 2364/3690 [39:58<21:47, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 64%|██████████████████████████████████████████████████▋ | 2366/3690 [40:00<21:32, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 64%|██████████████████████████████████████████████████▋ | 2368/3690 [40:02<22:45, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 64%|██████████████████████████████████████████████████▋ | 2370/3690 [40:04<23:04, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 64%|██████████████████████████████████████████████████▊ | 2372/3690 [40:06<22:31, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 64%|██████████████████████████████████████████████████▊ | 2374/3690 [40:08<22:59, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 64%|██████████████████████████████████████████████████▊ | 2376/3690 [40:10<21:10, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 64%|██████████████████████████████████████████████████▉ | 2378/3690 [40:12<21:11, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 64%|██████████████████████████████████████████████████▉ | 2380/3690 [40:14<21:46, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|██████████████████████████████████████████████████▉ | 2382/3690 [40:16<22:41, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████ | 2384/3690 [40:18<22:29, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████ | 2385/3690 [40:19<23:06, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▏ | 2388/3690 [40:22<22:10, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▏ | 2390/3690 [40:24<22:13, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▏ | 2392/3690 [40:26<21:26, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▎ | 2394/3690 [40:28<21:00, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▎ | 2396/3690 [40:30<20:55, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▎ | 2398/3690 [40:32<20:17, 1.06it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 2400/3690 [40:34<21:13, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 2402/3690 [40:36<21:55, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 2404/3690 [40:38<22:02, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▌ | 2406/3690 [40:40<21:41, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▌ | 2408/3690 [40:42<22:15, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▌ | 2410/3690 [40:44<22:22, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▋ | 2412/3690 [40:46<20:49, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▋ | 2414/3690 [40:48<20:30, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▋ | 2416/3690 [40:50<21:03, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 66%|██████████████████████████████████████���████████████▊ | 2418/3690 [40:52<21:22, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 66%|███████████████████████████████████████████████████▊ | 2420/3690 [40:54<20:49, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 66%|███████████████████████████████████████████████████▊ | 2422/3690 [40:56<21:38, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 66%|███████████████████████████████████████████████████▉ | 2424/3690 [40:58<19:43, 1.07it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 66%|███████████████████████████████████████████████████▉ | 2426/3690 [41:00<20:28, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 66%|███████████████████████████████████████████████████▉ | 2428/3690 [41:02<21:16, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 66%|████████████████████████████████████████████████████ | 2430/3690 [41:04<21:02, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 66%|████████████████████████████████████████████████████ | 2432/3690 [41:06<20:44, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 66%|████████████████████████████████████████████████████ | 2434/3690 [41:08<19:44, 1.06it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 66%|████████████████████████████████████████████████████▏ | 2436/3690 [41:10<20:58, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 66%|████████████████████████████████████████████████████▏ | 2438/3690 [41:12<21:00, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 66%|████████████████████████████████████████████████████▏ | 2440/3690 [41:14<22:01, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 66%|████████████████████████████████████████████████████▎ | 2442/3690 [41:16<21:43, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 66%|████████████████████████████████████████████████████▎ | 2444/3690 [41:18<20:57, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 66%|████████████████████████████████████████████████████▎ | 2446/3690 [41:20<20:49, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 66%|████████████████████████████████████████████████████▍ | 2448/3690 [41:22<21:11, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 66%|████████████████████████████████████████████████████▍ | 2450/3690 [41:24<21:10, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 66%|████████████████████████████████████████████████████▍ | 2452/3690 [41:26<20:15, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 67%|████████████████████████████████████████████████████▌ | 2454/3690 [41:28<19:55, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 67%|████████████████████████████████████████████████████▌ | 2456/3690 [41:30<21:05, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 67%|████████████████████████████████████████████████████▌ | 2458/3690 [41:32<20:22, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 67%|████████████████████████████████████████████████████▋ | 2460/3690 [41:34<20:23, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 67%|████████████████████████████████████████████████████▋ | 2462/3690 [41:36<19:58, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 67%|████████████████████████████████████████████████████▊ | 2464/3690 [41:38<20:12, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 67%|████████████████████████████████████████████████████▊ | 2466/3690 [41:40<20:25, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 67%|████████████████████████████████████████████████████▊ | 2469/3690 [41:43<20:09, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 67%|████████████████████████████████████████████████████▉ | 2470/3690 [41:44<20:45, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 67%|████████████████████████████████████████████████████▉ | 2472/3690 [41:46<21:07, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 67%|████████████████████████████████████████████████████▉ | 2474/3690 [41:48<21:52, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 67%|█████████████████████████████████████████████████████ | 2476/3690 [41:50<21:30, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 67%|█████████████████████████████████████████████████████ | 2479/3690 [41:54<20:56, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 67%|█████████████████████████████████████████████████████ | 2481/3690 [41:55<20:06, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 67%|█████████████████████████████████████████████████████▏ | 2483/3690 [41:57<19:42, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 67%|█████████████████████████████████████████████████████▏ | 2485/3690 [41:59<19:54, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 67%|█████████████████████████████████████████████████████▏ | 2487/3690 [42:01<19:54, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 67%|█████████████████████████████████████████████████████▎ | 2489/3690 [42:03<19:07, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 68%|█████████████████████████████████████████████████████▎ | 2491/3690 [42:05<19:35, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 68%|█████████████████████████████████████████████████████▎ | 2493/3690 [42:07<19:18, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 68%|█████████████████████████████████████████████████████▍ | 2495/3690 [42:09<19:02, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 68%|█████████████████████████████████████████████████████▍ | 2497/3690 [42:11<20:39, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 68%|█████████████████████████████████████████████████████▌ | 2499/3690 [42:13<18:56, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 68%|█████████████████████████████████████████████████████▌ | 2501/3690 [42:15<18:47, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 68%|█████████████████████████████████████████████████████▌ | 2503/3690 [42:17<20:35, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 68%|█████████████████████████████████████████████████████▌ | 2504/3690 [42:18<21:24, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 68%|█████████████████████████████████████████████████████▋ | 2506/3690 [42:20<21:15, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 68%|█████████████████████████████████████████████████████▋ | 2508/3690 [42:23<21:25, 1.09s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 68%|█████████████████████████████████████████████████████▋ | 2510/3690 [42:24<18:50, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 68%|█████████████████████████████████████████████████████▊ | 2512/3690 [42:27<20:36, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 68%|█████████████████████████████████████████████████████▊ | 2514/3690 [42:29<20:10, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 68%|█████████████████████████████████████████████████████▊ | 2516/3690 [42:31<19:54, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 68%|█████████████████████████████████████████████████████▉ | 2519/3690 [42:34<20:11, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 68%|█████████████████████████████████████████████████████▉ | 2521/3690 [42:35<18:45, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 68%|█��████████████████████████████████████████████████████ | 2523/3690 [42:37<18:50, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 68%|██████████████████████████████████████████████████████ | 2524/3690 [42:39<20:09, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 68%|██████████████████████████████████████████████████████ | 2526/3690 [42:41<19:16, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 69%|██████████████████████████████████████████████████████ | 2528/3690 [42:43<20:19, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 69%|██████████████████████████████████████████████████████▏ | 2530/3690 [42:45<19:42, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 69%|██████████████████████████████████████████████████████▏ | 2532/3690 [42:47<20:00, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 69%|██████████████████████████████████████████████████████▎ | 2534/3690 [42:49<18:37, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 69%|██████████████████████████████████████████████████████▎ | 2536/3690 [42:51<19:03, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 69%|██████████████████████████████████████████████████████▎ | 2538/3690 [42:53<19:30, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 69%|██████████████████████████████████████████████████████▍ | 2540/3690 [42:55<20:28, 1.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 69%|██████████████████████████████████████████████████████▍ | 2542/3690 [42:57<19:59, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 69%|██████████████████████████████████████████████████████▍ | 2544/3690 [42:59<19:34, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 69%|██████████████████████████████████████████████████████▌ | 2546/3690 [43:01<17:09, 1.11it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 69%|██████████████████████████████████████████████████████▌ | 2549/3690 [43:04<19:27, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 69%|██████████████████████████████████████████████████████▌ | 2551/3690 [43:06<18:33, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 69%|██████████████████████████████████████████████████████▋ | 2553/3690 [43:08<18:44, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 69%|██████████████████████████████████████████████████████▋ | 2555/3690 [43:10<18:55, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 69%|██████████████████████████████████████████████████████▋ | 2557/3690 [43:12<19:38, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 69%|██████████████████████████████████████████████████████▊ | 2559/3690 [43:14<18:59, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 69%|██████████████████████████████████████████████████████▊ | 2561/3690 [43:16<19:15, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 69%|██████████████████████████████████████████████████████▊ | 2563/3690 [43:18<18:36, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 69%|██████████████████████████████████████████████████████▉ | 2564/3690 [43:19<19:04, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 70%|██████████████████████████████████████████████████████▉ | 2567/3690 [43:22<18:41, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 70%|██████████████████████████████████████████████████████▉ | 2568/3690 [43:23<18:27, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 70%|███████████████████████████████████████████████████████ | 2570/3690 [43:25<20:44, 1.11s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 70%|███████████████████████████████████████████████████████ | 2572/3690 [43:27<19:26, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 70%|███████████████████████████████████████████████████████ | 2574/3690 [43:29<18:50, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 70%|███████████████████████████████████████████████████████▏ | 2576/3690 [43:31<17:34, 1.06it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 70%|███████████████████████████████████████████████████████▏ | 2579/3690 [43:34<17:59, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 70%|███████████████████████████████████████████████████████▎ | 2581/3690 [43:36<17:47, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 70%|███████████████████████████████████████████████████████▎ | 2583/3690 [43:38<18:29, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 70%|███████████████████████████████████████████████████████▎ | 2585/3690 [43:40<18:36, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 70%|███████████████████████████████████████████████████████▎ | 2586/3690 [43:41<18:53, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 70%|███████████████████████████████████████████████████████▍ | 2588/3690 [43:43<17:55, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 70%|███████████████████████████████████████████████████████▍ | 2590/3690 [43:45<19:24, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 70%|███████████████████████████████████████████████████████▌ | 2593/3690 [43:48<18:24, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 70%|███████████████████████████████████████████████████████▌ | 2594/3690 [43:49<18:53, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 70%|███████████████████████████████████████████████████████▌ | 2596/3690 [43:52<19:14, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 70%|███████████████████████████████████████████████████████▌ | 2598/3690 [43:54<18:43, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 70%|███████████████████████████████████████████████████████▋ | 2600/3690 [43:56<19:27, 1.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 71%|███████████████████████████████████████████████████████▋ | 2602/3690 [43:58<18:52, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 71%|███████████████��███████████████████████████████████████▋ | 2604/3690 [44:00<19:09, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 71%|███████████████████████████████████████████████████████▊ | 2606/3690 [44:02<18:48, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 71%|███████████████████████████████████████████████████████▊ | 2608/3690 [44:04<17:36, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 71%|███████████████████████████████████████████████████████▉ | 2610/3690 [44:06<17:57, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 71%|███████████████████████████████████████████████████████▉ | 2612/3690 [44:08<17:01, 1.06it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 71%|███████████████████████████████████████████████████████▉ | 2614/3690 [44:10<18:10, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 71%|████████████████████████████████████████████████████████ | 2616/3690 [44:12<17:40, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 71%|████████████████████████████████████████████████████████ | 2618/3690 [44:14<18:49, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 71%|████████████████████████████████████████████████████████ | 2620/3690 [44:16<18:57, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 71%|████████████████████████████████████████████████████████▏ | 2622/3690 [44:18<18:54, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 71%|████████████████████████████████████████████████████████▏ | 2624/3690 [44:20<18:13, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 71%|████████████████████████████████████████████████████████▏ | 2626/3690 [44:23<19:20, 1.09s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 71%|████████████████████████████████████████████████████████▎ | 2628/3690 [44:25<18:24, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 71%|████████████████████████████████████████████████████████▎ | 2629/3690 [44:26<18:08, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 71%|████████████████████████████████████████████████████████▎ | 2632/3690 [44:29<18:06, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 71%|████████████████████████████████████████████████████████▎ | 2633/3690 [44:30<18:17, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 71%|████████████████████████████████████████████████████████▍ | 2635/3690 [44:32<18:22, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 71%|████████████████████████████████████████████████████████▍ | 2637/3690 [44:34<18:10, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 72%|████████████████████████████████████████████████████████▌ | 2640/3690 [44:37<17:08, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 72%|████████████████████████████████████████████████████████▌ | 2641/3690 [44:38<17:36, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 72%|████████████████████████████████████████████████████████▌ | 2644/3690 [44:41<17:31, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 72%|████████████████████████████████████████████████████████▋ | 2646/3690 [44:43<17:39, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 72%|████████████████████████████████████████████████████████▋ | 2648/3690 [44:45<17:15, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 72%|████████████████████████████████████████████████████████▋ | 2650/3690 [44:47<17:07, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 72%|████████████████████████████████████████████████████████▊ | 2651/3690 [44:48<17:02, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 72%|████████████████████████████████████████████████████████▊ | 2653/3690 [44:50<18:33, 1.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 72%|████████████████████████████████████████████████████████▊ | 2655/3690 [44:52<18:16, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 72%|████████████████████████████████████████████████████████▉ | 2657/3690 [44:54<17:58, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 72%|████████████████████████████████████████████████████████▉ | 2659/3690 [44:56<17:54, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 72%|████████████████████████████████████████████████████████▉ | 2661/3690 [44:58<16:53, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 72%|█████████████████████████████████████████████████████████ | 2664/3690 [45:01<15:28, 1.11it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 72%|█████████████████████████████████████████████████████████ | 2666/3690 [45:03<15:37, 1.09it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 72%|█████████████████████████████████████████████████████████ | 2668/3690 [45:05<17:07, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 72%|█████████████████████████████████████████████████████████▏ | 2670/3690 [45:07<17:15, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 72%|█████████████████████████████████████████████████████████▏ | 2671/3690 [45:08<18:09, 1.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 72%|█████████████████████████████████████████████████████████▏ | 2673/3690 [45:10<17:24, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 73%|█████████████████████████████████████████████████████████▎ | 2676/3690 [45:13<15:20, 1.10it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 73%|█████████████████████████████████████████████████████████▎ | 2678/3690 [45:15<16:37, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 73%|█████████████████████████████████████████████████████████▍ | 2680/3690 [45:17<16:28, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 73%|█████████████████████████████████████████████████████████▍ | 2682/3690 [45:19<17:05, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 73%|█████████████████████████████████████████████████████████▍ | 2684/3690 [45:21<16:43, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 73%|█████████████████████████████████████████████████████████▌ | 2686/3690 [45:23<16:28, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 73%|█████████████████████████████████████████████████████████▌ | 2687/3690 [45:24<17:16, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 73%|█████████████████████████████████████████████████████████▌ | 2689/3690 [45:26<17:47, 1.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 73%|█████████████████████████████████████████████████████████▌ | 2691/3690 [45:28<17:52, 1.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 73%|█████████████████████████████████████████████████████████▋ | 2694/3690 [45:31<16:05, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 73%|█████████████████████████████████████████████████████████▋ | 2696/3690 [45:33<15:41, 1.06it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 73%|█████████████████████████████████████████████████████████▊ | 2698/3690 [45:35<15:16, 1.08it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 73%|█████████████████████████████████████████████████████████▊ | 2700/3690 [45:37<15:48, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 73%|█████████████████████████████████████████████████████████▊ | 2702/3690 [45:39<15:58, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 73%|█████████████████████████████████████████████████████████▉ | 2704/3690 [45:41<16:30, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 73%|█████████████████████████████████████████████████████████▉ | 2706/3690 [45:43<17:00, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 73%|█████████████████████████████████████████████████████████▉ | 2708/3690 [45:45<15:44, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 73%|██████████████████████████████████████████████████████████ | 2710/3690 [45:46<15:12, 1.07it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 73%|██████████████████████████████████████████████████████████ | 2712/3690 [45:48<15:07, 1.08it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 74%|██████████████████████████████████████████████████████████▏ | 2715/3690 [45:51<15:15, 1.06it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 74%|██████████████████████████████████████████████████████████▏ | 2717/3690 [45:53<15:30, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 74%|██████████████████████████████████████████████████████████▏ | 2719/3690 [45:55<15:31, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 74%|██████████████████████████████████████████████████████████▎ | 2721/3690 [45:57<15:17, 1.06it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 74%|██████████████████████████████████████████████████████████▎ | 2723/3690 [45:59<15:24, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 74%|██████████████████████████████████████████████████████████▎ | 2725/3690 [46:01<16:08, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 74%|███████████████████████████████��██████████████████████████▍ | 2727/3690 [46:03<15:52, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 74%|██████████████████████████████████████████████████████████▍ | 2729/3690 [46:05<16:07, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 74%|██████████████████████████████████████████████████████████▍ | 2731/3690 [46:07<15:59, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 74%|██████████████████████████████████████████████████████████▌ | 2733/3690 [46:09<16:25, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 74%|██████████████████████████████████████████████████████████▌ | 2735/3690 [46:11<15:24, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 74%|██████████████████████████████████████████████████████████▌ | 2737/3690 [46:13<16:05, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 74%|██████████████████████████████████████████████████████████▋ | 2739/3690 [46:15<15:35, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 74%|██████████████████████████████████████████████████████████▋ | 2741/3690 [46:17<15:24, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 74%|██████████████████████████████████████████████████████████▋ | 2743/3690 [46:19<15:46, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 74%|██████████████████████████████████████████████████████████▊ | 2745/3690 [46:21<15:07, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 74%|███████████████████████████████████████████████████��██████▊ | 2747/3690 [46:23<15:35, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 74%|██████████████████████████████████████████████████████████▊ | 2749/3690 [46:25<16:50, 1.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|██████████████████████████████████████████████████████████▉ | 2751/3690 [46:27<16:53, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|██████████████████████████████████████████████████████████▉ | 2753/3690 [46:29<15:39, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|██████████████████████████████████████████████████████████▉ | 2755/3690 [46:31<15:24, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████ | 2757/3690 [46:33<15:06, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████ | 2759/3690 [46:35<15:25, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████ | 2761/3690 [46:37<15:45, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▏ | 2763/3690 [46:39<15:24, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▏ | 2765/3690 [46:41<15:15, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▏ | 2767/3690 [46:43<15:30, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 2769/3690 [46:45<15:36, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 2771/3690 [46:47<15:30, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 2773/3690 [46:49<14:35, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▍ | 2775/3690 [46:51<14:50, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▍ | 2777/3690 [46:53<14:27, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▍ | 2779/3690 [46:55<15:04, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▌ | 2781/3690 [46:57<15:30, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▌ | 2783/3690 [46:59<15:21, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▌ | 2785/3690 [47:01<14:39, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 76%|███████████████████████████████████████████████████████████▋ | 2788/3690 [47:03<13:40, 1.10it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 76%|███████████████████████████████████████████████████████████▋ | 2789/3690 [47:04<13:43, 1.09it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 76%|███████████████████████████████████████████████████████████▊ | 2792/3690 [47:07<14:09, 1.06it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 76%|███████████████████████████████████████████████████████████▊ | 2794/3690 [47:09<13:57, 1.07it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 76%|███████████████████████████████████████████████████████████▊ | 2796/3690 [47:11<14:55, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 76%|███████████████████████████████████████████████████████████▉ | 2798/3690 [47:13<15:39, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 76%|███████████████████████████████████████████████████████████▉ | 2800/3690 [47:15<14:50, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 76%|███████████████████████████████████████████████████████████▉ | 2801/3690 [47:16<14:45, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 76%|████████████████████████████████████████████████████████████ | 2803/3690 [47:19<15:58, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 76%|████████████████████████████████████████████████████████████ | 2805/3690 [47:21<15:50, 1.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 76%|████████████████████████████████████████████████████████████ | 2807/3690 [47:23<14:52, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 76%|████████████████████████████████████████████████████████████▏ | 2809/3690 [47:24<13:43, 1.07it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 76%|████████████████████████████████████████████████████████████▏ | 2811/3690 [47:26<14:24, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 76%|████████████████████████████████████████████████████████████▏ | 2813/3690 [47:29<14:48, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 76%|████████████████████████████████████████████████████████████▎ | 2815/3690 [47:31<14:33, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 76%|████████████████████████████████████████████████████████████▎ | 2817/3690 [47:33<15:01, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 76%|████████████████████████████████████████████████████████████▎ | 2819/3690 [47:35<15:34, 1.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 76%|████████████████████████████████████████████████████████████▍ | 2821/3690 [47:37<14:00, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 77%|████████████████████████████████████████████████████████████▍ | 2823/3690 [47:39<14:46, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 77%|████████████████████████████████████████████████████████████▌ | 2826/3690 [47:42<14:05, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 77%|████████████████████████████████████████████████████████████▌ | 2827/3690 [47:43<14:28, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 77%|████████████████████████████████████████████████████████████▌ | 2830/3690 [47:46<14:12, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 77%|████████████████████████████████████████████████████████████▌ | 2831/3690 [47:47<14:34, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 77%|████████████████████████████████████████████████████████████▋ | 2834/3690 [47:49<13:20, 1.07it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 77%|████████████████████████████████████████████████████████████▋ | 2836/3690 [47:51<12:40, 1.12it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 77%|████████████████████████████████████████████████████████████▊ | 2838/3690 [47:53<13:09, 1.08it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 77%|████████████████████████████████████████████████████████████▊ | 2840/3690 [47:55<14:10, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 77%|████████████████████████████████████████████████████████████▊ | 2842/3690 [47:57<14:31, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 77%|████████████████████████████████████████████████████████████▉ | 2844/3690 [47:59<14:27, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 77%|████████████████████████████████████████████████████████████▉ | 2846/3690 [48:01<14:16, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 77%|████████████████████████████████████████████████████████████▉ | 2848/3690 [48:03<14:52, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 77%|█████████████████████████████████████████████████████████████ | 2850/3690 [48:05<13:27, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 77%|█████████████████████████████████████████████████████████████ | 2852/3690 [48:07<13:40, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 77%|█████████████████████████████████████████████████████████████ | 2854/3690 [48:09<14:12, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 77%|█████████████████████████████████████████████████████████████▏ | 2856/3690 [48:11<13:15, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 77%|█████████████████████████████████████████████████████████████▏ | 2858/3690 [48:13<13:54, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 78%|█████████████████████████████████████████████████████████████▏ | 2860/3690 [48:15<13:24, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 78%|█████████████████████████████████████████████████████████████▎ | 2862/3690 [48:17<14:08, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 78%|█████████████████████████████████████████████████████████████▎ | 2863/3690 [48:18<14:57, 1.09s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 78%|█████████████████████████████████████████████████████████████▎ | 2865/3690 [48:21<14:33, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 78%|█████████████████████████████████████████████████████████████▍ | 2867/3690 [48:23<14:29, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 78%|█████████████████████████████████████████████████████████████▍ | 2869/3690 [48:25<13:57, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 78%|█████████████████████████████████████████████████████████████▍ | 2871/3690 [48:27<13:41, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 78%|█████████████████████████████████████████████████████████████▌ | 2873/3690 [48:29<13:22, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 78%|█████████████████████████████████████████████████████████████▌ | 2875/3690 [48:31<13:57, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 78%|█████████████████████████████████████████████████████████████▌ | 2878/3690 [48:33<12:48, 1.06it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 78%|█████████████████████████████████████████████████████████████▋ | 2880/3690 [48:35<13:26, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 78%|█████████████████████████████████████████████████████████████▋ | 2882/3690 [48:37<12:50, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 78%|█████████████████████████████████████████████████████████████▋ | 2884/3690 [48:39<13:34, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 78%|█████████████████████████████████████████████████████████████▊ | 2886/3690 [48:41<13:46, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 78%|█████████████████████████████████████████████████████████████▊ | 2887/3690 [48:43<14:23, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 78%|█████████████████████████████████████████████████████████████▊ | 2890/3690 [48:46<13:03, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 78%|█████████████████████████████████████████████████████████████▉ | 2891/3690 [48:47<13:12, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 78%|█████████████████████████████████████████████████████████████▉ | 2894/3690 [48:50<13:19, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 78%|██████████████████████████████████████████████████████████████ | 2896/3690 [48:52<13:19, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 79%|██████████████████████████████████████████████████████████████ | 2897/3690 [48:53<13:41, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 79%|██████████████████████████████████████████████████████████████ | 2899/3690 [48:55<14:02, 1.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 79%|██████████████████████████████████████████████████████████████ | 2901/3690 [48:57<13:21, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 79%|██████████████████████████████████████████████████████████████▏ | 2903/3690 [48:59<13:32, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 79%|██████████████████████████████████████████████████████████████▏ | 2906/3690 [49:02<12:43, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 79%|██████████████████████████████████████████████████████████████▏ | 2907/3690 [49:03<13:13, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 79%|██████████████████████████████████████████████████████████████▎ | 2910/3690 [49:06<12:56, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 79%|██████████████████████████████████████████████████████████████▎ | 2912/3690 [49:08<12:54, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 79%|██████████████████████████████████████████████████████████████▍ | 2914/3690 [49:10<13:05, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 79%|██████████████████████████████████████████████████████████████▍ | 2915/3690 [49:11<13:52, 1.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 79%|██████████████████████████████████████████████████████████████▍ | 2917/3690 [49:13<14:31, 1.13s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 79%|██████████████████████████████████████████████████████████████▍ | 2919/3690 [49:15<13:30, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 79%|██████████████████████████████████████████████████████████████▌ | 2921/3690 [49:17<13:17, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 79%|██████████████████████████████████████████████████████████████▌ | 2923/3690 [49:20<13:19, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 79%|██████████████████████████████████████████████████████████████▌ | 2925/3690 [49:21<12:38, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 79%|██████████████████████████████████████████████████████████████▋ | 2927/3690 [49:23<12:04, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 79%|██████████████████████████████████████████████████████████████▋ | 2929/3690 [49:25<12:44, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 79%|██████████████████████████████████████████████████████████████▊ | 2931/3690 [49:27<12:42, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 79%|██████████████████████████████████████████████████████████████▊ | 2933/3690 [49:30<13:02, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 80%|██████████████████████████████████████████████████████████████▊ | 2935/3690 [49:32<13:10, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 80%|██████████████████████████████████████████████████████████████▊ | 2936/3690 [49:33<12:51, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 80%|██████████████████████████████████████████████████████████████▉ | 2938/3690 [49:35<13:03, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 80%|██████████████████████████████████████████████████████████████▉ | 2940/3690 [49:37<12:50, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 80%|██████████████████████████████████████████████████████████████▉ | 2942/3690 [49:39<12:47, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 80%|███████████████████████████████████████████████████████████████ | 2944/3690 [49:41<13:22, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 80%|███████████████████████████████████████████████████████████████ | 2946/3690 [49:43<12:30, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 80%|███████████████████████████████████████████████████████████████ | 2948/3690 [49:45<12:29, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 80%|███████████████████████████████████████████████████████████████▏ | 2950/3690 [49:47<11:54, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 80%|███████████████████████████████████████████████████████████████▏ | 2952/3690 [49:49<12:29, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 80%|███████████████████████████████████████████████████████████████▎ | 2955/3690 [49:52<11:55, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 80%|███████████████████████████████████████████████████████████████▎ | 2957/3690 [49:54<11:38, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 80%|███████████████████████████████████████████████████████████████▎ | 2959/3690 [49:56<12:11, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 80%|███████████████████████████████████████████████████████████████▍ | 2961/3690 [49:58<12:35, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 80%|███████████████████████████████████████████████████████████████▍ | 2963/3690 [50:00<11:51, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 80%|███████████████████████████████████████████████████████████████▍ | 2965/3690 [50:02<12:38, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 80%|███████████████████████████████████████████████████████████████▌ | 2967/3690 [50:04<11:36, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 80%|███████████████████████████████████████████████████████████████▌ | 2969/3690 [50:06<11:52, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 81%|███████████████████████████████████████████████████████████████▌ | 2971/3690 [50:08<11:25, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 81%|███████████████████████████████████████████████████████████████▋ | 2973/3690 [50:09<11:40, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 81%|███████████████████████████████████████████████████████████████▋ | 2975/3690 [50:12<12:10, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 81%|███████████████████████████████████████████████████████████████▋ | 2977/3690 [50:14<12:19, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 81%|███████████████████████████████████████████████████████████████▊ | 2979/3690 [50:16<12:11, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 81%|███████████████████████████████████████████████████████████████▊ | 2981/3690 [50:18<12:30, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 81%|███████████████████████████████████████████████████████████████▊ | 2983/3690 [50:20<11:58, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 81%|███████████████████████████████████████████████████████████████▉ | 2985/3690 [50:22<11:49, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 81%|███████████████████████████████████████████████████████████████▉ | 2987/3690 [50:24<11:15, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 81%|███████████████████████████████████████████████████████████████▉ | 2989/3690 [50:26<11:11, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 81%|████████████████████████████████████████████████████████████████ | 2991/3690 [50:28<11:09, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 81%|████████████████████████████████████████████████████████████████ | 2993/3690 [50:30<11:36, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 81%|████████████████████████████████████████████████████████████████ | 2995/3690 [50:31<11:18, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 81%|████████████████████████████████████████████████████████████████▏ | 2997/3690 [50:34<11:37, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 81%|████████████████████████████████████████████████████████████████▏ | 2999/3690 [50:36<11:46, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 81%|████████████████████████████████████████████████████████████████▏ | 3001/3690 [50:38<12:23, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 81%|████████████████████████████████████████████████████████████████▎ | 3003/3690 [50:40<12:18, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 81%|████████████████████████████████████████████████████████████████▎ | 3005/3690 [50:42<12:08, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 81%|████████████████████████████████████████████████████████████████▍ | 3007/3690 [50:44<11:43, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 82%|████████████████████████████████████████████████████████████████▍ | 3009/3690 [50:46<11:51, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 82%|████████████████████████████████████████████████████████████████▍ | 3011/3690 [50:48<11:30, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 82%|████████████████████████████████████████████████████████████████▌ | 3013/3690 [50:50<11:47, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 82%|████████████████████████████████████████████████████████████████▌ | 3015/3690 [50:52<11:29, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 82%|████████████████████████████████████████████████████████████████▌ | 3017/3690 [50:54<10:59, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 82%|████████████████████████████████████████████████████████████████▌ | 3018/3690 [50:55<11:24, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 82%|████████████████████████████████████████████████████████████████▋ | 3020/3690 [50:58<12:06, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 82%|████████████████████████████████████████████████████████████████▋ | 3022/3690 [51:00<12:06, 1.09s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 82%|████████████████████████████████████████████████████████████████▋ | 3024/3690 [51:02<11:13, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 82%|████████████████████████████████████████████████████████████████▊ | 3026/3690 [51:04<11:28, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 82%|████████████████████████████████████████████████████████████████▊ | 3028/3690 [51:06<10:38, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 82%|████████████████████████████████████████████████████████████████▊ | 3030/3690 [51:08<11:18, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 82%|████████████████████████████████████████████████████████████████▉ | 3032/3690 [51:10<11:12, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 82%|████████████████████████████████████████████████████████████████▉ | 3034/3690 [51:12<10:42, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 82%|█████████████████████████████████████████████████████████████████ | 3037/3690 [51:14<10:17, 1.06it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 82%|█████████████████████████████████████████████████████████████████ | 3039/3690 [51:16<10:18, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 82%|█████████████████████████████████████████████████████████████████ | 3041/3690 [51:18<10:07, 1.07it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 82%|█████████████████████████████████████████████████████████████████▏ | 3043/3690 [51:20<10:48, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 83%|█████████████████████████████████████████████████████████████████▏ | 3045/3690 [51:22<09:54, 1.08it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 83%|███████████████��█████████████████████████████████████████████████▏ | 3047/3690 [51:24<10:12, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 83%|█████████████████████████████████████████████████████████████████▎ | 3049/3690 [51:26<10:10, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 83%|█████████████████████████████████████████████████████████████████▎ | 3051/3690 [51:28<10:21, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 83%|█████████████████████████████████████████████████████████████████▎ | 3053/3690 [51:30<09:55, 1.07it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 83%|█████████████████████████████████████████████████████████████████▍ | 3055/3690 [51:32<09:58, 1.06it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 83%|█████████████████████████████████████████████████████████████████▍ | 3057/3690 [51:34<10:17, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 83%|█████████████████████████████████████████████████████████████████▍ | 3059/3690 [51:35<09:53, 1.06it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 83%|█████████████████████████████████████████████████████████████████▌ | 3061/3690 [51:37<10:10, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 83%|█████████████████████████████████████████████████████████████████▌ | 3063/3690 [51:39<09:44, 1.07it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 83%|█████████████████████████████████████████████████████████████████▌ | 3065/3690 [51:41<10:07, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 83%|█████████████████████████████████████████████████████████████████▋ | 3067/3690 [51:43<10:30, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 83%|█████████████████████████████████████████████████████████████████▋ | 3069/3690 [51:45<10:33, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 83%|█████████████████████████████████████████████████████████████████▋ | 3071/3690 [51:47<09:49, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 83%|█████████████████████████████████████████████████████████████████▊ | 3073/3690 [51:49<10:29, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 83%|█████████████████████████████████████████████████████████████████▊ | 3075/3690 [51:51<10:31, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 83%|█████████████████████████████████████████████████████████████████▉ | 3077/3690 [51:54<10:30, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 83%|█████████████████████████████████████████████████████████████████▉ | 3079/3690 [51:55<10:04, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 83%|█████████████████████████████████████████████████████████████████▉ | 3081/3690 [51:57<10:01, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████ | 3083/3690 [51:59<09:37, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████ | 3086/3690 [52:02<09:30, 1.06it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████ | 3088/3690 [52:04<09:39, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▏ | 3090/3690 [52:06<09:41, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▏ | 3092/3690 [52:08<09:28, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▏ | 3094/3690 [52:10<09:40, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▎ | 3096/3690 [52:12<09:47, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▎ | 3098/3690 [52:14<09:37, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▎ | 3100/3690 [52:16<10:12, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▍ | 3102/3690 [52:18<09:53, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▍ | 3104/3690 [52:20<10:17, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▍ | 3106/3690 [52:22<10:20, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▌ | 3108/3690 [52:24<10:16, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▌ | 3110/3690 [52:26<09:49, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▋ | 3112/3690 [52:28<09:30, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▋ | 3114/3690 [52:30<09:24, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▋ | 3116/3690 [52:32<09:50, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▊ | 3118/3690 [52:34<09:27, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 85%|██████████████████████████████████████████████████████████████████▊ | 3120/3690 [52:36<09:00, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 85%|██████████████████████████████████████████████████████████████████▊ | 3122/3690 [52:38<09:03, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 85%|██████████████████████████████████████████████████████████████████▉ | 3124/3690 [52:40<08:58, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 85%|██████████████████████████████████████████████████████████████████▉ | 3126/3690 [52:42<08:54, 1.06it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 85%|██████████████████████████████████████████████████████████████████▉ | 3128/3690 [52:44<08:57, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 85%|███████████████████████████████████████████████████████████████████ | 3130/3690 [52:46<09:28, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 85%|███████████████████████████████████████████████████████████████████ | 3132/3690 [52:48<09:24, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 85%|███████████████████████████████████████████████████████████████████ | 3134/3690 [52:50<09:39, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 85%|███████████████████████████████████████████████████████████████████▏ | 3136/3690 [52:52<09:21, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 85%|███████████████████████████████████████████████████████████████████▏ | 3138/3690 [52:54<09:28, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 85%|███████████████████████████████████████████████████████████████████▏ | 3140/3690 [52:56<09:20, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 85%|███████████████████████████████████████████████████████████████████▎ | 3142/3690 [52:58<09:36, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 85%|███████████████████████████████████████████████████████████████████▎ | 3144/3690 [53:00<09:43, 1.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 85%|█████████████████████████████████��█████████████████████████████████▎ | 3146/3690 [53:02<09:10, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 85%|███████████████████████████████████████████████████████████████████▍ | 3148/3690 [53:04<09:13, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 85%|███████████████████████████████████████████████████████████████████▍ | 3150/3690 [53:06<09:09, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 85%|███████████████████████████████████████████████████████████████████▍ | 3152/3690 [53:09<09:06, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 85%|███████████████████████████████████████████████████████████████████▌ | 3154/3690 [53:11<09:05, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 86%|███████████████████████████████████████████████████████████████████▌ | 3156/3690 [53:13<09:23, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 86%|███████████████████████████████████████████████████████████████████▌ | 3158/3690 [53:15<09:00, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 86%|███████████████████████████████████████████████████████████████████▋ | 3160/3690 [53:16<08:26, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 86%|███████████████████████████████████████████████████████████████████▋ | 3163/3690 [53:19<08:06, 1.08it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 86%|███████████████████████████████████████████████████████████████████▋ | 3164/3690 [53:20<08:39, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 86%|███████████████████████████████████████████████████████████████████▊ | 3166/3690 [53:22<08:49, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 86%|███████████████████████████████████████████████████████████████████▊ | 3168/3690 [53:24<08:35, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 86%|███████████████████████████████████████████████████████████████████▊ | 3170/3690 [53:26<08:51, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 86%|███████████████████████████████████████████████████████████████████▉ | 3173/3690 [53:29<08:14, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 86%|███████████████████████████████████████████████████████████████████▉ | 3175/3690 [53:31<08:16, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 86%|███████████████████████████████████████████████████████████████████▉ | 3176/3690 [53:32<08:31, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 86%|████████████████████████████████████████████████████████████████████ | 3178/3690 [53:34<08:57, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 86%|████████████████████████████████████████████████████████████████████ | 3180/3690 [53:36<08:34, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 86%|████████████████████████████████████████████████████████████████████▏ | 3183/3690 [53:39<08:08, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 86%|████████████████████████████████████████████████████████████████████▏ | 3185/3690 [53:41<07:45, 1.08it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 86%|████████████████████████████████████████████████████████████████████▏ | 3187/3690 [53:43<08:00, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 86%|████████████████████████████████████████████████████████████████████▎ | 3189/3690 [53:45<08:13, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 86%|████████████████████████████████████████████████████████████████████▎ | 3191/3690 [53:47<07:58, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 87%|████████████████████████████████████████████████████████████████████▎ | 3193/3690 [53:49<07:37, 1.09it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 87%|████████████████████████████████████████████████████████████████████▍ | 3195/3690 [53:51<07:53, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 87%|████████████████████████████████████████████████████████████████████▍ | 3197/3690 [53:53<07:53, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 87%|████████████████████████████████████████████████████████████████████▍ | 3199/3690 [53:55<07:54, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 87%|████████████████████████████████████████████████████████████████████▌ | 3201/3690 [53:57<07:57, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 87%|████████████████████████████████████████████████████████████████████▌ | 3203/3690 [53:59<08:05, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 87%|████████████████████████████████████████████████████████████████████▋ | 3206/3690 [54:01<07:35, 1.06it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 87%|████████████████████████████████████████████████████████████████████▋ | 3208/3690 [54:03<07:59, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 87%|████████████████████████████████████████████████████████████████████▋ | 3209/3690 [54:04<08:09, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 87%|████████████████████████████████████████████████████████████████████▊ | 3212/3690 [54:07<07:56, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 87%|████████████████████████████████████████████████████████████████████▊ | 3214/3690 [54:09<07:30, 1.06it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 87%|████████████████████████████████████████████████████████████████████▊ | 3216/3690 [54:11<07:26, 1.06it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 87%|████████████████████████████████████████████████████████████████████▉ | 3218/3690 [54:13<07:48, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 87%|████████████████████████████████████████████████████████████████████▉ | 3220/3690 [54:15<08:01, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 87%|████████████████████████████████████████████████████████████████████▉ | 3222/3690 [54:17<08:02, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 87%|█████████████████████████████████████████████████████████████████████ | 3224/3690 [54:19<07:54, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 87%|██████████████████████████████████���██████████████████████████████████ | 3226/3690 [54:22<08:13, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 87%|█████████████████████████████████████████████████████████████████████ | 3228/3690 [54:24<08:08, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▏ | 3229/3690 [54:25<08:04, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▏ | 3232/3690 [54:28<07:49, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▏ | 3233/3690 [54:29<07:46, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▎ | 3235/3690 [54:31<07:55, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▎ | 3237/3690 [54:33<08:07, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▎ | 3239/3690 [54:35<08:13, 1.09s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▍ | 3241/3690 [54:37<08:05, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▍ | 3242/3690 [54:39<08:25, 1.13s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▍ | 3244/3690 [54:41<08:32, 1.15s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▍ | 3246/3690 [54:43<08:13, 1.11s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▌ | 3248/3690 [54:45<08:13, 1.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▌ | 3250/3690 [54:48<08:17, 1.13s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▌ | 3251/3690 [54:49<08:10, 1.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 3253/3690 [54:51<08:12, 1.13s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 3255/3690 [54:53<08:19, 1.15s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 3257/3690 [54:56<08:23, 1.16s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▊ | 3258/3690 [54:57<08:23, 1.17s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▊ | 3260/3690 [54:59<07:55, 1.11s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▊ | 3262/3690 [55:01<07:48, 1.10s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▉ | 3264/3690 [55:03<07:31, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 89%|█████████████████████████████████████████████████████████████████████▉ | 3266/3690 [55:05<07:36, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 89%|█████████████████████████████████████████████████████████████████████▉ | 3268/3690 [55:07<07:19, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 89%|██████████████████████████████████████████████████████████████████████ | 3270/3690 [55:09<07:15, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 89%|██████████████████████████████████████████████████████████████████████ | 3272/3690 [55:12<07:30, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 89%|██████████████████████████████████████████████████████████████████████ | 3274/3690 [55:14<07:40, 1.11s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 89%|██████████████████████████████████████████████████████████████████████▏ | 3276/3690 [55:16<07:20, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 89%|██████████████████████████████████████████████████████████████████████▏ | 3277/3690 [55:17<07:35, 1.10s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 89%|██████████████████████████████████████████████████���███████████████████▏ | 3279/3690 [55:19<07:23, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 89%|██████████████████████████████████████████████████████████████████████▏ | 3281/3690 [55:21<07:18, 1.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 89%|██████████████████████████████████████████████████████████████████████▎ | 3283/3690 [55:24<07:58, 1.18s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 89%|██████████████████████████████████████████████████████████████████████▎ | 3285/3690 [55:26<07:27, 1.10s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 89%|██████████████████████████████████████████████████████████████████████▎ | 3287/3690 [55:28<06:58, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 89%|██████████████████████████████████████████████████████████████████████▍ | 3288/3690 [55:29<07:30, 1.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 89%|██████████████████████████████████████████████████████████████████████▍ | 3290/3690 [55:32<07:19, 1.10s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 89%|██████████████████████████████████████████████████████████████████████▍ | 3292/3690 [55:34<07:12, 1.09s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 89%|██████████████████████████████████████████████████████████████████████▌ | 3294/3690 [55:36<06:58, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 89%|██████████████████████████████████████████████████████████████████████▌ | 3296/3690 [55:38<06:53, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 89%|██████████████████████████████████████████████████████████████████████▌ | 3297/3690 [55:40<08:42, 1.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 89%|██████████████████████████████████████████████████████████████████████▌ | 3298/3690 [55:41<08:46, 1.34s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 89%|██████████████████████████████████████████████████████████████████████▋ | 3300/3690 [55:44<08:14, 1.27s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 89%|██████████████████████████████████████████████████████████████████████▋ | 3302/3690 [55:46<07:27, 1.15s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|██████████████████████████████████████████████████████████████████████▋ | 3304/3690 [55:48<07:12, 1.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|██████████████████████████████████████████████████████████████████████▊ | 3306/3690 [55:50<06:37, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|██████████████████████████████████████████████████████████████████████▊ | 3308/3690 [55:52<06:42, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|██████████████████████████████████████████████████████████████████████▊ | 3309/3690 [55:53<06:50, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|██████████████████████████████████████████████████████████████████████▉ | 3311/3690 [55:55<06:38, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|██████████████████████████████████████████████████████████████████████▉ | 3313/3690 [55:57<06:18, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|██████████████████████████████████████████████████████████████████████▉ | 3315/3690 [55:59<06:14, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|███████████████████████████████████████████████████████████████████████ | 3317/3690 [56:01<06:18, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|███████████████████████████████████████████████████████████████████████ | 3320/3690 [56:04<05:56, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|███████████████████████████████████████████████████████████████████████ | 3321/3690 [56:05<06:12, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|███████████████████████████████████████████████████████████████████████▏ | 3323/3690 [56:07<06:48, 1.11s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|███████████████████████████████████████████████████████████████████████▏ | 3325/3690 [56:09<06:27, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|███████████████████████████████████████████████████████████████████████▏ | 3327/3690 [56:12<06:24, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|███████████████████████████████████████████████████████████████████████▎ | 3329/3690 [56:14<06:21, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|███████████████████████████████████████████████████████████████████████▎ | 3331/3690 [56:16<06:33, 1.10s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|███████████████████████████████████████████��███████████████████████████▎ | 3333/3690 [56:18<06:24, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|███████████████████████████████████████████████████████████████████████▍ | 3334/3690 [56:19<06:49, 1.15s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|███████████████████████████████████████████████████████████████████████▍ | 3336/3690 [56:22<06:57, 1.18s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|███████████████████████████████████████████████████████████████████████▍ | 3338/3690 [56:24<06:15, 1.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 90%|███████████████████████████████████████████████████████████████████████▍ | 3339/3690 [56:25<06:29, 1.11s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 91%|███████████████████████████████████████████████████████████████████████▌ | 3341/3690 [56:28<07:01, 1.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 91%|███████████████████████████████████████████████████████████████████████▌ | 3343/3690 [56:30<06:43, 1.16s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 91%|███████████████████████████████████████████████████████████████████████▌ | 3345/3690 [56:32<06:18, 1.10s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 91%|███████████████████████████████████████████████████████████████████████▋ | 3347/3690 [56:34<06:23, 1.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 91%|███████████████████████████████████████████████████████████████████████▋ | 3349/3690 [56:36<06:13, 1.10s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 91%|███████████████████████████████████████████████████████████████████████▋ | 3351/3690 [56:38<06:07, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 91%|███████████████████████████████████████████████████████████████████████▊ | 3352/3690 [56:39<06:04, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 91%|███████████████████████████████████████████████████████████████████████▊ | 3354/3690 [56:42<06:20, 1.13s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 91%|███████████████████████████████████████████████████████████████████████▊ | 3356/3690 [56:44<06:19, 1.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 91%|███████████████████████████████████████████████████████████████████████▉ | 3358/3690 [56:46<06:10, 1.11s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 91%|███████████████████████████████████████████████████████████████████████▉ | 3359/3690 [56:47<06:11, 1.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 91%|███████████████████████████████████████████████████████████████████████▉ | 3361/3690 [56:49<05:50, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 91%|███████████████████████████████████████████████████████████████████████▉ | 3363/3690 [56:52<05:48, 1.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 91%|████████████████████████████████████████████████████████████████████████ | 3365/3690 [56:54<05:59, 1.10s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 91%|████████████████████████████████████████████████████████████████████████ | 3367/3690 [56:56<05:38, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 91%|████████████████████████████████████████████████████████████████████████▏ | 3369/3690 [56:58<05:35, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 91%|████████████████████████████████████████████████████████████████████████▏ | 3371/3690 [57:00<06:01, 1.13s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 91%|████████████████████████████████████████████████████████████████████████▏ | 3373/3690 [57:02<05:32, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 91%|████████████████████████████████████████████████████████████████████████▎ | 3375/3690 [57:05<05:46, 1.10s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|████████████████████████████████████████████████████████████████████████▎ | 3377/3690 [57:07<05:27, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|████████████████████████████████████████████████████████████████████████▎ | 3379/3690 [57:08<05:03, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|████████████████████████████████████████████████████████████████████████▍ | 3381/3690 [57:11<05:14, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|████████████████████████████████████████████████████████████████████████▍ | 3382/3690 [57:12<05:28, 1.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|████████████████████████████████████████████████████████████████████████▍ | 3384/3690 [57:14<05:29, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|████████████████████████████████████████████████████████████████████████▍ | 3386/3690 [57:16<05:15, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|████████████████████████████████████████████████████████████████████████▌ | 3388/3690 [57:18<05:48, 1.15s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|████████████████████████████████████████████████████████████████████████▌ | 3390/3690 [57:21<05:39, 1.13s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|████████████████████████████████████████████████████████████████████████▌ | 3391/3690 [57:22<05:19, 1.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|████████████████████████████████████████████████████████████████████████▋ | 3393/3690 [57:24<05:32, 1.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|████████████████████████████████████████████████████████████████████████▋ | 3395/3690 [57:26<05:21, 1.09s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|████████████████████████████████████████████████████████████████████████▋ | 3397/3690 [57:28<05:21, 1.10s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|████████████████████████████████████████████████████████████████████████▊ | 3399/3690 [57:30<05:18, 1.09s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|████████████████████████████████████████████████████████████████████████▊ | 3401/3690 [57:33<05:23, 1.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|████████████████████████████████████████████████████████████████████████▊ | 3403/3690 [57:35<05:02, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|████████████████████████████████████████████████████████████████████████▉ | 3404/3690 [57:36<05:25, 1.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|████████████████████████████████████████████████████████████████████████▉ | 3406/3690 [57:38<05:31, 1.17s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|████████████████████████████████████████████████████████████████████████▉ | 3408/3690 [57:41<05:23, 1.15s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|█████████████████████████████████████████████████████████████████████████ | 3410/3690 [57:43<05:15, 1.13s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|█████████████████████████████████████████████████████████████████████████ | 3411/3690 [57:44<05:22, 1.16s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 92%|█████████████████████████████████████████████████████████████████████████ | 3413/3690 [57:46<05:18, 1.15s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|█████████████████████████████████████████████████████████████████████████ | 3415/3690 [57:48<04:58, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|█████████████████████████████████████████████████████████████████████████▏ | 3417/3690 [57:50<04:39, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|█████████████████████████████████████████████████████████████████████████▏ | 3419/3690 [57:52<04:33, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|█████████████████████████████████████████████████████████████████████████▏ | 3421/3690 [57:54<04:16, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|█████████████████████████████████████████████████████████████████████████▎ | 3423/3690 [57:56<04:24, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|█████████████████████████████████████████████████████████████████████████▎ | 3425/3690 [57:58<04:20, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|█████████████████████████████████████████████████████████████████████████▎ | 3427/3690 [58:00<04:24, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|█████████████████████████████████████████████████████████████████████████▍ | 3429/3690 [58:02<04:20, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|█████████████████████████████████████████████████████████████████████████▍ | 3431/3690 [58:04<04:15, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|█████████████████████████████████████████████████████████████████████████▍ | 3433/3690 [58:06<04:24, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|█████████████████████████████████████████████████████████████████████████▌ | 3435/3690 [58:08<04:19, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|█████████████████████████████████████████████████████████████████████████▌ | 3437/3690 [58:10<04:09, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|█████████████████████████████████████████████████████████████████████████▋ | 3439/3690 [58:12<04:12, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|█████████████████████████████████████████████████████████████████████████▋ | 3441/3690 [58:14<04:15, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|█████████████████████████████████████████████████████████████████████████▋ | 3443/3690 [58:16<04:06, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|█████████████████████████████████████████████████████████████████████████▊ | 3445/3690 [58:18<04:05, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|█████████████████████████████████████████████████████████████████████████▊ | 3447/3690 [58:20<04:04, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|█████████████████████████████████████████████████████████████████████████▊ | 3449/3690 [58:22<04:01, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|█████████████████████████████████████████████████████████████████████████▉ | 3451/3690 [58:25<04:23, 1.10s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|█████████████████████████████████████████████████████████████████████████▉ | 3453/3690 [58:27<04:10, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|█████████████████████████████████████████████████████████████████████████▉ | 3455/3690 [58:29<04:04, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████ | 3457/3690 [58:31<03:56, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|███████████████████████████████████████████████████████████████��██████████ | 3459/3690 [58:33<03:45, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████ | 3461/3690 [58:35<03:44, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▏ | 3464/3690 [58:37<03:41, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▏ | 3466/3690 [58:40<03:45, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▏ | 3468/3690 [58:42<03:46, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 3469/3690 [58:43<03:53, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 3472/3690 [58:46<03:34, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▍ | 3474/3690 [58:47<03:31, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▍ | 3476/3690 [58:50<03:42, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▍ | 3478/3690 [58:52<03:30, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▌ | 3480/3690 [58:53<03:24, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▌ | 3482/3690 [58:56<03:38, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▌ | 3484/3690 [58:58<03:29, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▋ | 3486/3690 [58:59<03:11, 1.07it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 95%|██████████████████████████████████████████████████████████████████████████▋ | 3488/3690 [59:02<03:22, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 95%|██████████████████████████████████████████████████████████████████████████▋ | 3490/3690 [59:03<03:13, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 95%|██████████████████████████████████████████████████████████████████████████▊ | 3492/3690 [59:05<03:14, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 95%|██████████████████████████████████████████████████████████████████████████▊ | 3494/3690 [59:07<03:11, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 95%|██████████████████████████████████████████████████████████████████████████▊ | 3496/3690 [59:09<03:07, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 95%|██████████████████████████████████████████████████████████████████████████▉ | 3498/3690 [59:11<02:56, 1.09it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 95%|██████████████████████████████████████████████████████████████████████████▉ | 3500/3690 [59:13<03:02, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 95%|██████████████████████████████████████████████████████████████████████████▉ | 3502/3690 [59:15<03:02, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 95%|███████████████████████████████████████████████████████████████████████████ | 3504/3690 [59:17<03:12, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 95%|███████████████████████████████████████████████████████████████████████████ | 3506/3690 [59:19<03:19, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 95%|███████████████████████████████████████████████████████████████████████████ | 3508/3690 [59:22<03:15, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 95%|███████████████████████████████████████████████████████████████████████████ | 3509/3690 [59:23<03:14, 1.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 95%|███████████████████████████████████████████████████████████████████████████▏ | 3511/3690 [59:25<03:13, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 95%|███████████████████████████████████████████████████████████████████████████▏ | 3513/3690 [59:27<03:11, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 95%|███████████████████████████████████████████████████████████████████████████▎ | 3515/3690 [59:29<03:01, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 95%|███████████████████████████████████████████████████████████████████████████▎ | 3517/3690 [59:31<02:55, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 95%|███████████████████████████████████████████████████████████████████████████▎ | 3520/3690 [59:34<02:52, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 95%|███████████████████████████████████████████████████████████████████████████▍ | 3521/3690 [59:35<03:00, 1.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 95%|███████████████████████████████████████████████████████████████████████████▍ | 3523/3690 [59:37<03:00, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|███████████████████████████████████████████████████████████████████████████▍ | 3525/3690 [59:39<02:55, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|███████████████████████████████████████████████████████████████████████████▍ | 3526/3690 [59:41<02:56, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|███████████████████████████████████████████████████████████████████████████▌ | 3528/3690 [59:43<02:52, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|███████████████████████████████████████████████████████████████████████████▌ | 3531/3690 [59:46<02:36, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|███████████████████████████████████████████████████████████████████████████▋ | 3533/3690 [59:48<02:34, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|███████████████████████████████████████████████████████████████████████████▋ | 3535/3690 [59:49<02:30, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|███████████████████████████████████████████████████████████████████████████▋ | 3537/3690 [59:51<02:30, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|███████████████████████████████████████████████████████████████████████████▊ | 3539/3690 [59:54<02:31, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|███████████████████████████████████████████████████████████████████████████▊ | 3541/3690 [59:56<02:30, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|███████████████████████████████████████████████████████████████████████████▊ | 3543/3690 [59:58<02:30, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|███████████████████████████████████████████████████████████████████████████▊ | 3544/3690 [59:59<02:32, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████ | 3547/3690 [1:00:02<02:20, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████ | 3549/3690 [1:00:03<02:13, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████ | 3551/3690 [1:00:05<02:12, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████▏ | 3553/3690 [1:00:07<02:16, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████▏ | 3555/3690 [1:00:09<02:10, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████▏ | 3557/3690 [1:00:11<02:12, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 96%|██████████████████████████████████████████████████████████████████████████▎ | 3559/3690 [1:00:14<02:15, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▎ | 3561/3690 [1:00:16<02:21, 1.10s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▎ | 3563/3690 [1:00:18<02:12, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▍ | 3565/3690 [1:00:20<02:02, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▍ | 3567/3690 [1:00:22<02:02, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▍ | 3569/3690 [1:00:24<02:04, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▌ | 3571/3690 [1:00:26<01:58, 1.00it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▌ | 3573/3690 [1:00:28<02:02, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▌ | 3575/3690 [1:00:30<01:59, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▌ | 3576/3690 [1:00:31<02:01, 1.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▋ | 3578/3690 [1:00:34<02:06, 1.13s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▋ | 3580/3690 [1:00:35<01:54, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▋ | 3582/3690 [1:00:37<01:50, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▊ | 3585/3690 [1:00:40<01:36, 1.09it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▊ | 3587/3690 [1:00:42<01:38, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▉ | 3589/3690 [1:00:44<01:33, 1.08it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▉ | 3591/3690 [1:00:46<01:41, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|██████████████████████████████████████████████████████████████████████████▉ | 3593/3690 [1:00:48<01:28, 1.09it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|███████████████████████████████████████████████████████████████████████████ | 3595/3690 [1:00:50<01:32, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 97%|███████████████████████████████████████████████████████████████████████████ | 3597/3690 [1:00:52<01:34, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████ | 3599/3690 [1:00:54<01:29, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▏ | 3601/3690 [1:00:56<01:25, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▏ | 3603/3690 [1:00:58<01:26, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▏ | 3605/3690 [1:01:00<01:23, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▎ | 3607/3690 [1:01:01<01:19, 1.04it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▎ | 3609/3690 [1:01:03<01:19, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████�� | 3611/3690 [1:01:06<01:23, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▍ | 3613/3690 [1:01:08<01:20, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▍ | 3615/3690 [1:01:10<01:16, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▍ | 3617/3690 [1:01:12<01:16, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▌ | 3619/3690 [1:01:14<01:09, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▌ | 3621/3690 [1:01:16<01:07, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▌ | 3623/3690 [1:01:18<01:08, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▋ | 3625/3690 [1:01:20<01:08, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▋ | 3627/3690 [1:01:22<01:06, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▋ | 3629/3690 [1:01:24<01:02, 1.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▋ | 3630/3690 [1:01:25<01:02, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▊ | 3632/3690 [1:01:27<01:01, 1.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 98%|███████████████████████████████████████████████████████████████████████████▊ | 3634/3690 [1:01:29<00:57, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 99%|███████████████████████████████████████████████████████████████████████████▉ | 3637/3690 [1:01:32<00:51, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 99%|███████████████████████████████████████████████████████████████████████████▉ | 3638/3690 [1:01:33<00:52, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 99%|███████████████████████████████████████████████████████████████████████████▉ | 3641/3690 [1:01:36<00:48, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 99%|███████████████████████████████████████████████████████████████████████████▉ | 3642/3690 [1:01:38<00:48, 1.00s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 99%|████████████████████████████████████████████████████████████████████████████ | 3645/3690 [1:01:40<00:43, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 99%|████████████████████████████████████████████████████████████████████████████ | 3647/3690 [1:01:42<00:42, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 99%|████████████████████████████████████████████████████████████████████████████▏| 3649/3690 [1:01:44<00:38, 1.07it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 99%|████████████████████████████████████████████████████████████████████████████▏| 3651/3690 [1:01:46<00:38, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 99%|████████████████████████████████████████████████████████████████████████████▏| 3653/3690 [1:01:48<00:36, 1.01it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 99%|████████████████████████████████████████████████████████████████████████████▎| 3655/3690 [1:01:50<00:36, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 99%|████████████████████████████████████████████████████████████████████████████▎| 3657/3690 [1:01:53<00:34, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 99%|████████████████████████████████████████████████████████████████████████████▎| 3658/3690 [1:01:54<00:34, 1.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 99%|████████████████████████████████████████████████████████████████████████████▎| 3660/3690 [1:01:56<00:31, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 99%|████████████████████████████████████████████████████████████████████████████▍| 3662/3690 [1:01:58<00:29, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 99%|████████████████████████████████████████████████████████████████████████████▍| 3664/3690 [1:02:00<00:26, 1.02s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 99%|████████████████████████████████████████████████████████████████████████████▌| 3667/3690 [1:02:03<00:21, 1.05it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 99%|████████████████████████████████████████████████████████████████████████████▌| 3669/3690 [1:02:04<00:19, 1.09it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 99%|████████████████████████████████████████████████████████████████████████████▌| 3671/3690 [1:02:06<00:16, 1.12it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +100%|████████████████████████████████████████████████████████████████████████████▋| 3673/3690 [1:02:08<00:16, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +100%|████████████████████████████████████████████████████████████████████████████▋| 3675/3690 [1:02:10<00:15, 1.05s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +100%|████████████████████████████████████████████████████████████████████████████▋| 3677/3690 [1:02:13<00:13, 1.04s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +100%|████████████████████████████████████████████████████████████████████████████▊| 3679/3690 [1:02:14<00:10, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +100%|████████████████████████████████████████████████████████████████████████████▊| 3681/3690 [1:02:16<00:09, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +100%|████████████████████████████████████████████████████████████████████████████▊| 3683/3690 [1:02:18<00:06, 1.06it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +100%|████████████████████████████████████████████████████████████████████████████▉| 3685/3690 [1:02:20<00:05, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +100%|█████████████████████████████████████████████████��██████████████████████████▉| 3687/3690 [1:02:22<00:02, 1.03it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +100%|████████████████████████████████████████████████████████████████████████████▉| 3689/3690 [1:02:24<00:01, 1.01s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +100%|█████████████████████████████████████████████████████████████████████████████| 3690/3690 [1:02:25<00:00, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +100%|█████████████████████████████████████████████████████████████████████████████| 3690/3690 [1:02:25<00:00, 1.02it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Configuration saved in ./checkpoint-4000/config.json `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Feature extractor saved in ./checkpoint-4000/preprocessor_config.json `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Feature extractor saved in ./checkpoint-4000/preprocessor_config.json `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... +To disable this warning, you can either: + - Avoid using `tokenizers` before the fork if possible + - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) +huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... +To disable this warning, you can either: + - Avoid using `tokenizers` before the fork if possible + - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) +Feature extractor saved in ./checkpoint-4000/preprocessor_config.json `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...hEncoderDecoderModel.forward` and have been ignored: lang, length. If lang, length are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... +To disable this warning, you can either: + - Avoid using `tokenizers` before the fork if possible + - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) +huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... +To disable this warning, you can either: + - Avoid using `tokenizers` before the fork if possible + - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) +huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... +To disable this warning, you can either: + - Avoid using `tokenizers` before the fork if possible + - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) +huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... +To disable this warning, you can either: + - Avoid using `tokenizers` before the fork if possible + - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) +huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... +To disable this warning, you can either: + - Avoid using `tokenizers` before the fork if possible + - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) +huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... +To disable this warning, you can either: + - Avoid using `tokenizers` before the fork if possible + - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) +huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... +To disable this warning, you can either: + - Avoid using `tokenizers` before the fork if possible + - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) +huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... +To disable this warning, you can either: + - Avoid using `tokenizers` before the fork if possible + - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) +huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... +To disable this warning, you can either: + - Avoid using `tokenizers` before the fork if possible + - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) +huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... +To disable this warning, you can either: + - Avoid using `tokenizers` before the fork if possible + - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) +huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... +To disable this warning, you can either: + - Avoid using `tokenizers` before the fork if possible + - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) +huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... +To disable this warning, you can either: + - Avoid using `tokenizers` before the fork if possible + - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) +huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... +To disable this warning, you can either: + - Avoid using `tokenizers` before the fork if possible + - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) +huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... +To disable this warning, you can either: + - Avoid using `tokenizers` before the fork if possible + - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) +huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... +To disable this warning, you can either: + - Avoid using `tokenizers` before the fork if possible + - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) +huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... +To disable this warning, you can either: + - Avoid using `tokenizers` before the fork if possible + - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) +huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... +To disable this warning, you can either: + - Avoid using `tokenizers` before the fork if possible + - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) +huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... +To disable this warning, you can either: + - Avoid using `tokenizers` before the fork if possible + - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) +huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... +To disable this warning, you can either: + - Avoid using `tokenizers` before the fork if possible + - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) +huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... +To disable this warning, you can either: + - Avoid using `tokenizers` before the fork if possible + - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) +huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible