diff --git "a/wandb/run-20220505_173748-b097rk18/files/output.log" "b/wandb/run-20220505_173748-b097rk18/files/output.log" --- "a/wandb/run-20220505_173748-b097rk18/files/output.log" +++ "b/wandb/run-20220505_173748-b097rk18/files/output.log" @@ -32670,3 +32670,4064 @@ Could not estimate the number of tokens of the input, floating-point operations 82%|████████████████████████████████████████████████████████████▉ | 4000/4860 [15:58:21<2:44:53, 11.50s/it]Saving model checkpoint to ./checkpoint-4000nt checkpointing. Setting `use_cache=False`... Model weights saved in ./checkpoint-4000/pytorch_model.bin███████▉ | 4000/4860 [15:58:21<2:44:53, 11.50s/it]Saving model checkpoint to ./checkpoint-4000nt checkpointing. Setting `use_cache=False`... Feature extractor saved in ./preprocessor_config.jsonl.bin███████▉ | 4000/4860 [15:58:21<2:44:53, 11.50s/it]Saving model checkpoint to ./checkpoint-4000nt checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [15:58:21<2:44:53, 11.50s/it]Saving model checkpoint to ./checkpoint-4000nt checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [15:58:21<2:44:53, 11.50s/it]Saving model checkpoint to ./checkpoint-4000nt checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [15:58:21<2:44:53, 11.50s/it]Saving model checkpoint to ./checkpoint-4000nt checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [15:58:21<2:44:53, 11.50s/it]Saving model checkpoint to ./checkpoint-4000nt checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [15:58:21<2:44:53, 11.50s/it]Saving model checkpoint to ./checkpoint-4000nt checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [15:58:21<2:44:53, 11.50s/it]Saving model checkpoint to ./checkpoint-4000nt checkpointing. Setting `use_cache=False`... +{'loss': 2.265, 'learning_rate': 5.965596330275229e-06, 'epoch': 2.47} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:44:53, 11.50s/it]Saving model checkpoint to ./checkpoint-4000nt checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:44:53, 11.50s/it]Saving model checkpoint to ./checkpoint-4000nt checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:44:53, 11.50s/it]Saving model checkpoint to ./checkpoint-4000nt checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:44:53, 11.50s/it]Saving model checkpoint to ./checkpoint-4000nt checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:44:53, 11.50s/it]Saving model checkpoint to ./checkpoint-4000nt checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:44:53, 11.50s/it]Saving model checkpoint to ./checkpoint-4000nt checkpointing. Setting `use_cache=False`... +{'loss': 2.1912, 'learning_rate': 5.9587155963302756e-06, 'epoch': 2.47} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:44:53, 11.50s/it]Saving model checkpoint to ./checkpoint-4000nt checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:44:53, 11.50s/it]Saving model checkpoint to ./checkpoint-4000nt checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:44:53, 11.50s/it]Saving model checkpoint to ./checkpoint-4000nt checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:44:53, 11.50s/it]Saving model checkpoint to ./checkpoint-4000nt checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:44:53, 11.50s/it]Saving model checkpoint to ./checkpoint-4000nt checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:44:53, 11.50s/it]Saving model checkpoint to ./checkpoint-4000nt checkpointing. Setting `use_cache=False`... +{'loss': 2.3031, 'learning_rate': 5.951834862385321e-06, 'epoch': 2.47} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:44:53, 11.50s/it]Saving model checkpoint to ./checkpoint-4000nt checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:44:53, 11.50s/it]Saving model checkpoint to ./checkpoint-4000nt checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:44:53, 11.50s/it]Saving model checkpoint to ./checkpoint-4000nt checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:44:53, 11.50s/it]Saving model checkpoint to ./checkpoint-4000nt checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:44:53, 11.50s/it]Saving model checkpoint to ./checkpoint-4000nt checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:44:53, 11.50s/it]Saving model checkpoint to ./checkpoint-4000nt checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.242, 'learning_rate': 5.938073394495412e-06, 'epoch': 2.47} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.1581, 'learning_rate': 5.931192660550459e-06, 'epoch': 2.47} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.0825, 'learning_rate': 5.9243119266055045e-06, 'epoch': 2.47} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.216, 'learning_rate': 5.903669724770642e-06, 'epoch': 2.48} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.0351, 'learning_rate': 5.889908256880734e-06, 'epoch': 2.48} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.0389, 'learning_rate': 5.8761467889908255e-06, 'epoch': 2.48} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.0654, 'learning_rate': 5.862385321100918e-06, 'epoch': 2.48} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:44:53, 11.50s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▏ | 4017/4860 [16:03:11<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [16:03:11<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [16:03:11<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.0274, 'learning_rate': 5.848623853211009e-06, 'epoch': 2.48} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.9662, 'learning_rate': 5.841743119266055e-06, 'epoch': 2.48} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.8094, 'learning_rate': 5.8279816513761466e-06, 'epoch': 2.48} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.9064, 'learning_rate': 5.821100917431193e-06, 'epoch': 2.48} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.9551, 'learning_rate': 5.814220183486239e-06, 'epoch': 2.48} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.822, 'learning_rate': 5.807339449541285e-06, 'epoch': 2.48} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.9694, 'learning_rate': 5.786697247706422e-06, 'epoch': 2.49} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.6722, 'learning_rate': 5.7798165137614684e-06, 'epoch': 2.49} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.714, 'learning_rate': 5.772935779816513e-06, 'epoch': 2.49} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.7326, 'learning_rate': 5.76605504587156e-06, 'epoch': 2.49} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.5729, 'learning_rate': 5.759174311926606e-06, 'epoch': 2.49} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.6166, 'learning_rate': 5.752293577981652e-06, 'epoch': 2.49} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:49:28, 12.06s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.0329, 'learning_rate': 5.697247706422018e-06, 'epoch': 2.49} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.823, 'learning_rate': 5.690366972477064e-06, 'epoch': 2.49} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.3292, 'learning_rate': 5.6697247706422026e-06, 'epoch': 2.5} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.2306, 'learning_rate': 5.649082568807339e-06, 'epoch': 2.5} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.125, 'learning_rate': 5.642201834862386e-06, 'epoch': 2.5} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.2185, 'learning_rate': 5.635321100917431e-06, 'epoch': 2.5} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.3527, 'learning_rate': 5.628440366972477e-06, 'epoch': 2.5} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:25:39, 6.26s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 83%|█████████████████████████████████████████████████████████████▋ | 4051/4860 [16:08:22<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.1914, 'learning_rate': 5.607798165137615e-06, 'epoch': 2.5} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.0974, 'learning_rate': 5.6009174311926604e-06, 'epoch': 2.5} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.225, 'learning_rate': 5.594036697247707e-06, 'epoch': 2.5} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.2065, 'learning_rate': 5.580275229357798e-06, 'epoch': 2.5} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.0812, 'learning_rate': 5.53211009174312e-06, 'epoch': 2.51} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:35:41, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 84%|█████████████████████████████████████████████████████████████▉ | 4065/4860 [16:10:41<2:04:19, 9.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [16:10:41<2:04:19, 9.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [16:10:41<2:04:19, 9.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:04:19, 9.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:04:19, 9.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:04:19, 9.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:04:19, 9.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:04:19, 9.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:04:19, 9.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:04:19, 9.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:04:19, 9.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:04:19, 9.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:04:19, 9.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:04:19, 9.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:04:19, 9.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:04:19, 9.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 84%|█████████████████████████████████████████████████████████████▉ | 4068/4860 [16:11:07<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.9435, 'learning_rate': 5.490825688073395e-06, 'epoch': 2.51} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.0221, 'learning_rate': 5.48394495412844e-06, 'epoch': 2.51} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.8406, 'learning_rate': 5.477064220183487e-06, 'epoch': 2.51} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.8137, 'learning_rate': 5.470183486238532e-06, 'epoch': 2.51} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.8664, 'learning_rate': 5.463302752293578e-06, 'epoch': 2.51} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.9112, 'learning_rate': 5.456422018348624e-06, 'epoch': 2.52} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.8564, 'learning_rate': 5.442660550458716e-06, 'epoch': 2.52} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.6841, 'learning_rate': 5.435779816513761e-06, 'epoch': 2.52} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.7562, 'learning_rate': 5.428899082568808e-06, 'epoch': 2.52} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.6522, 'learning_rate': 5.408256880733945e-06, 'epoch': 2.52} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:58:28, 8.98s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 84%|██████████████████████████████████████████████████████████████▏ | 4083/4860 [16:13:05<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 84%|██████████████████████████████████████████████████████████████▏ | 4083/4860 [16:13:05<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.7519, 'learning_rate': 5.346330275229358e-06, 'epoch': 2.53} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.4629, 'learning_rate': 5.339449541284404e-06, 'epoch': 2.53} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:34:49, 7.32s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 84%|██████████████████████████████████████████████████████████████▎ | 4094/4860 [16:14:46<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [16:14:46<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.2029, 'learning_rate': 5.305045871559633e-06, 'epoch': 2.53} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.2312, 'learning_rate': 5.298165137614679e-06, 'epoch': 2.53} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.3773, 'learning_rate': 5.291284403669725e-06, 'epoch': 2.53} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.3824, 'learning_rate': 5.277522935779816e-06, 'epoch': 2.53} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.166, 'learning_rate': 5.270642201834862e-06, 'epoch': 2.53} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.1849, 'learning_rate': 5.2637614678899085e-06, 'epoch': 2.53} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.9308, 'learning_rate': 5.256880733944955e-06, 'epoch': 2.53} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.2545, 'learning_rate': 5.243119266055046e-06, 'epoch': 2.53} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.0815, 'learning_rate': 5.236238532110092e-06, 'epoch': 2.54} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.0915, 'learning_rate': 5.229357798165138e-06, 'epoch': 2.54} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.1547, 'learning_rate': 5.2155963302752295e-06, 'epoch': 2.54} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.9817, 'learning_rate': 5.1674311926605505e-06, 'epoch': 2.54} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.8734, 'learning_rate': 5.153669724770643e-06, 'epoch': 2.54} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:34:15, 12.08s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|██████████████████████████████████████████████████████████████▋ | 4120/4860 [16:19:15<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.8829, 'learning_rate': 5.139908256880734e-06, 'epoch': 2.54} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.861, 'learning_rate': 5.1330275229357795e-06, 'epoch': 2.54} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.8032, 'learning_rate': 5.119266055045872e-06, 'epoch': 2.55} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.7708, 'learning_rate': 5.112385321100917e-06, 'epoch': 2.55} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.8962, 'learning_rate': 5.105504587155964e-06, 'epoch': 2.55} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.8115, 'learning_rate': 5.098623853211009e-06, 'epoch': 2.55} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.839, 'learning_rate': 5.091743119266056e-06, 'epoch': 2.55} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.7772, 'learning_rate': 5.0711009174311926e-06, 'epoch': 2.55} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:49:46, 8.90s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|███████████████████████████████████████████████████████████████ | 4140/4860 [16:21:40<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 85%|███████████████████████████████████████████████████████████████ | 4140/4860 [16:21:40<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.2829, 'learning_rate': 5.009174311926606e-06, 'epoch': 2.56} + 85%|███████████████████████████████████████████████████████████████ | 4140/4860 [16:21:40<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.7352, 'learning_rate': 5.002293577981651e-06, 'epoch': 2.56} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.1744, 'learning_rate': 4.954128440366973e-06, 'epoch': 2.56} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.1971, 'learning_rate': 4.947247706422018e-06, 'epoch': 2.56} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.2201, 'learning_rate': 4.940366972477064e-06, 'epoch': 2.56} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.181, 'learning_rate': 4.93348623853211e-06, 'epoch': 2.56} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.1874, 'learning_rate': 4.9266055045871565e-06, 'epoch': 2.56} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.2029, 'learning_rate': 4.919724770642201e-06, 'epoch': 2.56} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.0889, 'learning_rate': 4.912844036697248e-06, 'epoch': 2.56} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.1723, 'learning_rate': 4.905963302752293e-06, 'epoch': 2.56} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.0659, 'learning_rate': 4.8922018348623854e-06, 'epoch': 2.57} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.2409, 'learning_rate': 4.885321100917431e-06, 'epoch': 2.57} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.103, 'learning_rate': 4.8784403669724775e-06, 'epoch': 2.57} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.0346, 'learning_rate': 4.864678899082569e-06, 'epoch': 2.57} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.1019, 'learning_rate': 4.857798165137614e-06, 'epoch': 2.57} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.0501, 'learning_rate': 4.8440366972477065e-06, 'epoch': 2.57} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.0905, 'learning_rate': 4.816513761467891e-06, 'epoch': 2.57} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.8898, 'learning_rate': 4.802752293577982e-06, 'epoch': 2.57} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.9328, 'learning_rate': 4.788990825688074e-06, 'epoch': 2.58} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.8578, 'learning_rate': 4.782110091743119e-06, 'epoch': 2.58} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.8338, 'learning_rate': 4.775229357798165e-06, 'epoch': 2.58} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:14:14, 6.19s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▌ | 4178/4860 [16:28:11<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [16:28:11<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [16:28:11<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [16:28:11<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.623, 'learning_rate': 4.658256880733945e-06, 'epoch': 2.59} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.3705, 'learning_rate': 4.651376146788991e-06, 'epoch': 2.59} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.383, 'learning_rate': 4.637614678899083e-06, 'epoch': 2.59} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:20, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▉ | 4196/4860 [16:30:54<2:17:13, 12.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:17:13, 12.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:17:13, 12.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:17:13, 12.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:17:13, 12.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:17:13, 12.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:17:13, 12.40s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 86%|███████████████████████████████████████████████████████████████▉ | 4197/4860 [16:31:07<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.2268, 'learning_rate': 4.610091743119266e-06, 'epoch': 2.59} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.1454, 'learning_rate': 4.582568807339449e-06, 'epoch': 2.59} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.1143, 'learning_rate': 4.575688073394496e-06, 'epoch': 2.59} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.0436, 'learning_rate': 4.568807339449541e-06, 'epoch': 2.6} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.1584, 'learning_rate': 4.561926605504587e-06, 'epoch': 2.6} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.1767, 'learning_rate': 4.555045871559633e-06, 'epoch': 2.6} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.0763, 'learning_rate': 4.53440366972477e-06, 'epoch': 2.6} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<2:16:16, 12.33s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▏ | 4213/4860 [16:33:55<1:45:17, 9.77s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [16:33:55<1:45:17, 9.77s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [16:33:55<1:45:17, 9.77s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:45:17, 9.77s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:45:17, 9.77s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:45:17, 9.77s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:45:17, 9.77s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:45:17, 9.77s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:45:17, 9.77s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:45:17, 9.77s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:45:17, 9.77s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▏ | 4215/4860 [16:34:13<1:41:34, 9.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:41:34, 9.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:41:34, 9.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:41:34, 9.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:41:34, 9.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.9071, 'learning_rate': 4.486238532110092e-06, 'epoch': 2.6} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:41:34, 9.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:41:34, 9.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:41:34, 9.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:41:34, 9.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:41:34, 9.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:41:34, 9.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:41:34, 9.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:41:34, 9.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:41:34, 9.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:41:34, 9.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.1895, 'learning_rate': 4.4724770642201834e-06, 'epoch': 2.6} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:41:34, 9.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:41:34, 9.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:41:34, 9.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:41:34, 9.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.8468, 'learning_rate': 4.46559633027523e-06, 'epoch': 2.6} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:41:34, 9.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:41:34, 9.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:41:34, 9.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:41:34, 9.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:41:34, 9.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.0438, 'learning_rate': 4.4587155963302755e-06, 'epoch': 2.6} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:41:34, 9.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:41:34, 9.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:41:34, 9.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:41:34, 9.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.7339, 'learning_rate': 4.451834862385321e-06, 'epoch': 2.61} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:41:34, 9.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:41:34, 9.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:41:34, 9.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:41:34, 9.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.8835, 'learning_rate': 4.444954128440367e-06, 'epoch': 2.61} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:41:34, 9.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:41:34, 9.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:41:34, 9.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:41:34, 9.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:41:34, 9.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:41:34, 9.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:41:34, 9.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:41:34, 9.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:41:34, 9.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.7704, 'learning_rate': 4.431192660550459e-06, 'epoch': 2.61} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:41:34, 9.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:41:34, 9.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:41:34, 9.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:41:34, 9.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.8735, 'learning_rate': 4.4243119266055045e-06, 'epoch': 2.61} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:41:34, 9.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:41:34, 9.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:41:34, 9.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:41:34, 9.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.7917, 'learning_rate': 4.41743119266055e-06, 'epoch': 2.61} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:41:34, 9.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:41:34, 9.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:41:34, 9.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:41:34, 9.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:41:34, 9.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:41:34, 9.45s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▍ | 4228/4860 [16:36:00<1:22:27, 7.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▍ | 4228/4860 [16:36:00<1:22:27, 7.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.6564, 'learning_rate': 4.403669724770642e-06, 'epoch': 2.61} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [16:36:00<1:22:27, 7.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:22:27, 7.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:22:27, 7.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:22:27, 7.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:22:27, 7.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:22:27, 7.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:22:27, 7.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:22:27, 7.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:22:27, 7.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:22:27, 7.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:22:27, 7.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:22:27, 7.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:22:27, 7.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.5854, 'learning_rate': 4.38302752293578e-06, 'epoch': 2.61} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:22:27, 7.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:22:27, 7.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:22:27, 7.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:22:27, 7.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.6481, 'learning_rate': 4.376146788990826e-06, 'epoch': 2.61} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:22:27, 7.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:22:27, 7.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:22:27, 7.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:22:27, 7.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:22:27, 7.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:22:27, 7.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:22:27, 7.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:22:27, 7.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:22:27, 7.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:22:27, 7.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:22:27, 7.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:22:27, 7.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:22:27, 7.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:22:27, 7.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:22:27, 7.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:22:27, 7.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:22:27, 7.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:22:27, 7.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:22:27, 7.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:22:27, 7.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:22:27, 7.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:22:27, 7.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:22:27, 7.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:22:27, 7.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:22:27, 7.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:22:27, 7.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:22:27, 7.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:22:27, 7.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:22:27, 7.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:22:27, 7.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 4240/4860 [16:37:21<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 87%|████████████████████████████████████████████████████████████████▌ | 4240/4860 [16:37:21<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.1647, 'learning_rate': 4.321100917431193e-06, 'epoch': 2.62} + 87%|████████████████████████████████████████████████████████████████▌ | 4240/4860 [16:37:21<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.613, 'learning_rate': 4.314220183486239e-06, 'epoch': 2.62} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.0898, 'learning_rate': 4.26605504587156e-06, 'epoch': 2.62} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.149, 'learning_rate': 4.259174311926605e-06, 'epoch': 2.62} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.2782, 'learning_rate': 4.252293577981652e-06, 'epoch': 2.62} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.1831, 'learning_rate': 4.245412844036697e-06, 'epoch': 2.62} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.1947, 'learning_rate': 4.238532110091744e-06, 'epoch': 2.62} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.1111, 'learning_rate': 4.2316513761467886e-06, 'epoch': 2.63} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.2748, 'learning_rate': 4.224770642201835e-06, 'epoch': 2.63} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.1529, 'learning_rate': 4.217889908256881e-06, 'epoch': 2.63} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.0317, 'learning_rate': 4.211009174311927e-06, 'epoch': 2.63} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.12, 'learning_rate': 4.197247706422018e-06, 'epoch': 2.63} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.1827, 'learning_rate': 4.1834862385321104e-06, 'epoch': 2.63} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.9389, 'learning_rate': 4.1353211009174315e-06, 'epoch': 2.63} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.0647, 'learning_rate': 4.121559633027523e-06, 'epoch': 2.64} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.9626, 'learning_rate': 4.114678899082568e-06, 'epoch': 2.64} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.8267, 'learning_rate': 4.107798165137615e-06, 'epoch': 2.64} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.8445, 'learning_rate': 4.100917431192661e-06, 'epoch': 2.64} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.844, 'learning_rate': 4.094036697247706e-06, 'epoch': 2.64} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.7624, 'learning_rate': 4.0871559633027525e-06, 'epoch': 2.64} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.8293, 'learning_rate': 4.0733944954128446e-06, 'epoch': 2.64} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.5491, 'learning_rate': 4.025229357798166e-06, 'epoch': 2.64} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:03:12, 6.12s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|███████████████████████████████��█████████████████████████████████▎ | 4288/4860 [16:44:58<1:02:28, 6.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:02:28, 6.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:02:28, 6.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:02:28, 6.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:02:28, 6.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:02:28, 6.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:02:28, 6.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:02:28, 6.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:02:28, 6.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:02:28, 6.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:02:28, 6.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:02:28, 6.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:02:28, 6.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:02:28, 6.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:02:28, 6.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:02:28, 6.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:02:28, 6.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:02:28, 6.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:02:28, 6.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:02:28, 6.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:02:28, 6.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:02:28, 6.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:02:28, 6.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:02:28, 6.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:02:28, 6.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:02:28, 6.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:02:28, 6.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:02:28, 6.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:02:28, 6.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:02:28, 6.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:02:28, 6.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:02:28, 6.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.3478, 'learning_rate': 3.9564220183486235e-06, 'epoch': 2.65} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:02:28, 6.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:02:28, 6.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:02:28, 6.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:02:28, 6.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:02:28, 6.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:02:28, 6.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:02:28, 6.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.2748, 'learning_rate': 3.94954128440367e-06, 'epoch': 2.65} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:02:28, 6.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:02:28, 6.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:02:28, 6.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:02:28, 6.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:02:28, 6.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:02:28, 6.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 88%|█████████████████████████████████████████████████████████████████▍ | 4295/4860 [16:46:19<1:55:40, 12.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [16:46:19<1:55:40, 12.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [16:46:19<1:55:40, 12.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:55:40, 12.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:55:40, 12.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:55:40, 12.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:55:40, 12.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.145, 'learning_rate': 3.935779816513762e-06, 'epoch': 2.65} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:55:40, 12.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:55:40, 12.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:55:40, 12.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:55:40, 12.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:55:40, 12.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:55:40, 12.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.2242, 'learning_rate': 3.928899082568807e-06, 'epoch': 2.65} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:55:40, 12.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:55:40, 12.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:55:40, 12.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:55:40, 12.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:55:40, 12.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:55:40, 12.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.2247, 'learning_rate': 3.922018348623853e-06, 'epoch': 2.65} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:55:40, 12.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:55:40, 12.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:55:40, 12.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:55:40, 12.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:55:40, 12.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:55:40, 12.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.154, 'learning_rate': 3.915137614678899e-06, 'epoch': 2.65} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:55:40, 12.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:55:40, 12.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:55:40, 12.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:55:40, 12.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:55:40, 12.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.2411, 'learning_rate': 3.908256880733945e-06, 'epoch': 2.65} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:55:40, 12.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:55:40, 12.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:55:40, 12.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:55:40, 12.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:55:40, 12.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:55:40, 12.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:55:40, 12.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:55:40, 12.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:55:40, 12.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:55:40, 12.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:55:40, 12.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:55:40, 12.28s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|█████████████████████████████████████████████████████████████████▌ | 4302/4860 [16:47:41<1:45:37, 11.36s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:45:37, 11.36s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:45:37, 11.36s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:45:37, 11.36s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:45:37, 11.36s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:45:37, 11.36s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:45:37, 11.36s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.1862, 'learning_rate': 3.887614678899083e-06, 'epoch': 2.66} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:45:37, 11.36s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:45:37, 11.36s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:45:37, 11.36s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:45:37, 11.36s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:45:37, 11.36s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:45:37, 11.36s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:45:37, 11.36s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:45:37, 11.36s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:45:37, 11.36s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:45:37, 11.36s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:45:37, 11.36s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.2539, 'learning_rate': 3.873853211009174e-06, 'epoch': 2.66} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:45:37, 11.36s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:45:37, 11.36s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:45:37, 11.36s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:45:37, 11.36s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:45:37, 11.36s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|█████████████████████████████████████████████████████████████████▌ | 4306/4860 [16:48:23<1:39:23, 10.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [16:48:23<1:39:23, 10.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [16:48:23<1:39:23, 10.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [16:48:23<1:39:23, 10.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:39:23, 10.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:39:23, 10.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:39:23, 10.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:39:23, 10.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:39:23, 10.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:39:23, 10.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:39:23, 10.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:39:23, 10.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:39:23, 10.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:39:23, 10.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:39:23, 10.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:39:23, 10.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:39:23, 10.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:39:23, 10.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:39:23, 10.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:39:23, 10.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:39:23, 10.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:39:23, 10.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:39:23, 10.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:39:23, 10.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.0428, 'learning_rate': 3.839449541284403e-06, 'epoch': 2.66} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:39:23, 10.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:39:23, 10.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:39:23, 10.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:39:23, 10.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:39:23, 10.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.0056, 'learning_rate': 3.83256880733945e-06, 'epoch': 2.66} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:39:23, 10.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:39:23, 10.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:39:23, 10.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:39:23, 10.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:39:23, 10.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:39:23, 10.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:39:23, 10.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:39:23, 10.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:39:23, 10.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:39:23, 10.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|█████████████████████████████████████████████████████████████████▋ | 4313/4860 [16:49:32<1:28:58, 9.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [16:49:32<1:28:58, 9.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [16:49:32<1:28:58, 9.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:28:58, 9.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:28:58, 9.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:28:58, 9.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:28:58, 9.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:28:58, 9.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:28:58, 9.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:28:58, 9.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:28:58, 9.76s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|█████████████████████████████████████████████████████████████████▋ | 4315/4860 [16:49:50<1:25:54, 9.46s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [16:49:50<1:25:54, 9.46s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [16:49:50<1:25:54, 9.46s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:25:54, 9.46s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:25:54, 9.46s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.1032, 'learning_rate': 3.798165137614679e-06, 'epoch': 2.66} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:25:54, 9.46s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:25:54, 9.46s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:25:54, 9.46s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:25:54, 9.46s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:25:54, 9.46s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:25:54, 9.46s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:25:54, 9.46s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:25:54, 9.46s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:25:54, 9.46s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:25:54, 9.46s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|█████████████████████████████████████████████████████████████████▋ | 4318/4860 [16:50:17<1:21:34, 9.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:21:34, 9.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:21:34, 9.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:21:34, 9.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:21:34, 9.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.8577, 'learning_rate': 3.7775229357798168e-06, 'epoch': 2.67} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:21:34, 9.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:21:34, 9.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:21:34, 9.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:21:34, 9.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:21:34, 9.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:21:34, 9.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:21:34, 9.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:21:34, 9.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:21:34, 9.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.9121, 'learning_rate': 3.763761467889908e-06, 'epoch': 2.67} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:21:34, 9.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:21:34, 9.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:21:34, 9.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:21:34, 9.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:21:34, 9.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.9352, 'learning_rate': 3.756880733944954e-06, 'epoch': 2.67} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:21:34, 9.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:21:34, 9.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:21:34, 9.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:21:34, 9.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.7315, 'learning_rate': 3.75e-06, 'epoch': 2.67} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:21:34, 9.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:21:34, 9.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:21:34, 9.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:21:34, 9.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.7162, 'learning_rate': 3.743119266055046e-06, 'epoch': 2.67} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:21:34, 9.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:21:34, 9.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:21:34, 9.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:21:34, 9.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.8008, 'learning_rate': 3.7362385321100918e-06, 'epoch': 2.67} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:21:34, 9.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:21:34, 9.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:21:34, 9.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:21:34, 9.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.7511, 'learning_rate': 3.729357798165138e-06, 'epoch': 2.67} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:21:34, 9.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:21:34, 9.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:21:34, 9.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.8311, 'learning_rate': 3.7224770642201834e-06, 'epoch': 2.67} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:21:34, 9.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:21:34, 9.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:21:34, 9.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:21:34, 9.03s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|█████████████████████████████████████████████████████████████████▉ | 4328/4860 [16:51:38<1:09:27, 7.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [16:51:38<1:09:27, 7.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [16:51:38<1:09:27, 7.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:09:27, 7.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:09:27, 7.83s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|█████████████████████████████████████████████████████████████████▉ | 4329/4860 [16:51:45<1:08:36, 7.75s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [16:51:45<1:08:36, 7.75s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:08:36, 7.75s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:08:36, 7.75s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:08:36, 7.75s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:08:36, 7.75s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:08:36, 7.75s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:08:36, 7.75s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:08:36, 7.75s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:08:36, 7.75s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:08:36, 7.75s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:08:36, 7.75s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:08:36, 7.75s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:08:36, 7.75s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:08:36, 7.75s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:08:36, 7.75s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:08:36, 7.75s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:08:36, 7.75s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:08:36, 7.75s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:08:36, 7.75s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:08:36, 7.75s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:08:36, 7.75s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:08:36, 7.75s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:08:36, 7.75s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:08:36, 7.75s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:08:36, 7.75s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:08:36, 7.75s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:08:36, 7.75s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:08:36, 7.75s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:08:36, 7.75s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:08:36, 7.75s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.3574, 'learning_rate': 3.6605504587155965e-06, 'epoch': 2.68} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:08:36, 7.75s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:08:36, 7.75s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:08:36, 7.75s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:08:36, 7.75s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:08:36, 7.75s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:08:36, 7.75s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:08:36, 7.75s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:08:36, 7.75s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:08:36, 7.75s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:08:36, 7.75s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:08:36, 7.75s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|███████████████████████████████████████████████████████████████████▊ | 4339/4860 [16:52:53<55:24, 6.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed53<55:24, 6.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed53<55:24, 6.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed53<55:24, 6.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.2203, 'learning_rate': 3.6330275229357803e-06, 'epoch': 2.68} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed53<55:24, 6.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed53<55:24, 6.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed53<55:24, 6.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed53<55:24, 6.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed53<55:24, 6.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed53<55:24, 6.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed53<55:24, 6.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed53<55:24, 6.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed53<55:24, 6.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed53<55:24, 6.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed53<55:24, 6.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed53<55:24, 6.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed53<55:24, 6.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed53<55:24, 6.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed53<55:24, 6.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed53<55:24, 6.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed53<55:24, 6.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed53<55:24, 6.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed53<55:24, 6.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed53<55:24, 6.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed53<55:24, 6.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed53<55:24, 6.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed53<55:24, 6.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed53<55:24, 6.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.5337, 'learning_rate': 3.6123853211009176e-06, 'epoch': 2.68} +Could not estimate the number of tokens of the input, floating-point operations will not be computed53<55:24, 6.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed53<55:24, 6.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed53<55:24, 6.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed53<55:24, 6.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed53<55:24, 6.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed53<55:24, 6.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed53<55:24, 6.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed53<55:24, 6.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed53<55:24, 6.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed53<55:24, 6.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed53<55:24, 6.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed53<55:24, 6.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed53<55:24, 6.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed53<55:24, 6.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.0985, 'learning_rate': 3.5986238532110092e-06, 'epoch': 2.68} +Could not estimate the number of tokens of the input, floating-point operations will not be computed53<55:24, 6.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed53<55:24, 6.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed53<55:24, 6.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed53<55:24, 6.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed53<55:24, 6.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed53<55:24, 6.38s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 89%|██████████████████████████████████████████████████████████████████▏ | 4346/4860 [16:54:21<1:45:56, 12.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:45:56, 12.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:45:56, 12.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:45:56, 12.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:45:56, 12.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:45:56, 12.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:45:56, 12.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.0257, 'learning_rate': 3.584862385321101e-06, 'epoch': 2.68} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:45:56, 12.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:45:56, 12.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:45:56, 12.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:45:56, 12.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:45:56, 12.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:45:56, 12.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:45:56, 12.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:45:56, 12.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:45:56, 12.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:45:56, 12.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:45:56, 12.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:45:56, 12.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:45:56, 12.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.2663, 'learning_rate': 3.5711009174311925e-06, 'epoch': 2.68} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:45:56, 12.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:45:56, 12.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:45:56, 12.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:45:56, 12.37s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▏ | 4350/4860 [16:55:08<1:40:30, 11.82s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▏ | 4350/4860 [16:55:08<1:40:30, 11.82s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:40:30, 11.82s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:40:30, 11.82s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:40:30, 11.82s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:40:30, 11.82s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:40:30, 11.82s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.0192, 'learning_rate': 3.5573394495412846e-06, 'epoch': 2.69} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:40:30, 11.82s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:40:30, 11.82s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:40:30, 11.82s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:40:30, 11.82s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:40:30, 11.82s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.2417, 'learning_rate': 3.5504587155963307e-06, 'epoch': 2.69} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:40:30, 11.82s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:40:30, 11.82s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:40:30, 11.82s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:40:30, 11.82s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:40:30, 11.82s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:40:30, 11.82s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.2602, 'learning_rate': 3.5435779816513763e-06, 'epoch': 2.69} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:40:30, 11.82s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:40:30, 11.82s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:40:30, 11.82s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:40:30, 11.82s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:40:30, 11.82s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.075, 'learning_rate': 3.5366972477064223e-06, 'epoch': 2.69} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:40:30, 11.82s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:40:30, 11.82s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:40:30, 11.82s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:40:30, 11.82s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:40:30, 11.82s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:40:30, 11.82s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:40:30, 11.82s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:40:30, 11.82s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:40:30, 11.82s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:40:30, 11.82s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:40:30, 11.82s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:40:30, 11.82s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.051, 'learning_rate': 3.522935779816514e-06, 'epoch': 2.69} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:40:30, 11.82s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:40:30, 11.82s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:40:30, 11.82s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:40:30, 11.82s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:40:30, 11.82s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▎ | 4357/4860 [16:56:23<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [16:56:23<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [16:56:23<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [16:56:23<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.0667, 'learning_rate': 3.5091743119266056e-06, 'epoch': 2.69} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.0288, 'learning_rate': 3.4954128440366977e-06, 'epoch': 2.69} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.0973, 'learning_rate': 3.4885321100917434e-06, 'epoch': 2.69} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.0677, 'learning_rate': 3.474770642201835e-06, 'epoch': 2.69} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.0187, 'learning_rate': 3.4610091743119267e-06, 'epoch': 2.69} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.8184, 'learning_rate': 3.4403669724770644e-06, 'epoch': 2.7} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.8973, 'learning_rate': 3.43348623853211e-06, 'epoch': 2.7} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.9186, 'learning_rate': 3.426605504587156e-06, 'epoch': 2.7} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.7952, 'learning_rate': 3.419724770642202e-06, 'epoch': 2.7} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.8999, 'learning_rate': 3.412844036697248e-06, 'epoch': 2.7} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.9107, 'learning_rate': 3.3990825688073398e-06, 'epoch': 2.7} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.6683, 'learning_rate': 3.378440366972477e-06, 'epoch': 2.7} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:29:54, 10.72s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▋ | 4378/4860 [16:59:27<1:02:33, 7.79s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▋ | 4378/4860 [16:59:27<1:02:33, 7.79s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.6626, 'learning_rate': 3.371559633027523e-06, 'epoch': 2.7} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:02:33, 7.79s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:02:33, 7.79s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:02:33, 7.79s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:02:33, 7.79s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:02:33, 7.79s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:02:33, 7.79s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:02:33, 7.79s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:02:33, 7.79s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:02:33, 7.79s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:02:33, 7.79s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:02:33, 7.79s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:02:33, 7.79s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:02:33, 7.79s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:02:33, 7.79s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:02:33, 7.79s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:02:33, 7.79s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:02:33, 7.79s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:02:33, 7.79s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:02:33, 7.79s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:02:33, 7.79s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:02:33, 7.79s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:02:33, 7.79s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:02:33, 7.79s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:02:33, 7.79s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:02:33, 7.79s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|████████████████████████████████████████████████████████████████████▌ | 4384/4860 [17:00:11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|████████████████████████████████████████████████████████████████████▌ | 4384/4860 [17:00:11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.2863, 'learning_rate': 3.2889908256880735e-06, 'epoch': 2.71} +Could not estimate the number of tokens of the input, floating-point operations will not be computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.7415, 'learning_rate': 3.282110091743119e-06, 'epoch': 2.71} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.3943, 'learning_rate': 3.268348623853211e-06, 'epoch': 2.71} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.2061, 'learning_rate': 3.254587155963303e-06, 'epoch': 2.71} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.1599, 'learning_rate': 3.247706422018349e-06, 'epoch': 2.71} +Could not estimate the number of tokens of the input, floating-point operations will not be computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed11<57:12, 7.21s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 90%|██████████████████████████████████████████████████████████████████▉ | 4397/4860 [17:02:21<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [17:02:21<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.1942, 'learning_rate': 3.2339449541284406e-06, 'epoch': 2.71} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.1437, 'learning_rate': 3.227064220183486e-06, 'epoch': 2.72} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.2261, 'learning_rate': 3.2201834862385322e-06, 'epoch': 2.72} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.1839, 'learning_rate': 3.213302752293578e-06, 'epoch': 2.72} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.9636, 'learning_rate': 3.19954128440367e-06, 'epoch': 2.72} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.982, 'learning_rate': 3.192660550458716e-06, 'epoch': 2.72} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.1982, 'learning_rate': 3.1857798165137616e-06, 'epoch': 2.72} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.1514, 'learning_rate': 3.1788990825688076e-06, 'epoch': 2.72} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.9799, 'learning_rate': 3.1720183486238533e-06, 'epoch': 2.72} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.9757, 'learning_rate': 3.1651376146788993e-06, 'epoch': 2.72} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.9922, 'learning_rate': 3.151376146788991e-06, 'epoch': 2.72} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.9657, 'learning_rate': 3.1238532110091747e-06, 'epoch': 2.72} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.0713, 'learning_rate': 3.1100917431192664e-06, 'epoch': 2.73} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.9186, 'learning_rate': 3.103211009174312e-06, 'epoch': 2.73} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.9285, 'learning_rate': 3.096330275229358e-06, 'epoch': 2.73} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.8838, 'learning_rate': 3.0894495412844036e-06, 'epoch': 2.73} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.8975, 'learning_rate': 3.0756880733944953e-06, 'epoch': 2.73} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.0665, 'learning_rate': 3.0688073394495413e-06, 'epoch': 2.73} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.8544, 'learning_rate': 3.0619266055045874e-06, 'epoch': 2.73} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:32:10, 11.95s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|█████████████████████████████████████████████████████████████████████▏ | 4424/4860 [17:06:32<57:04, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...60 [17:06:32<57:04, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed32<57:04, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed32<57:04, 7.86s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|█████████████████████████████████████████████████████████████████████▏ | 4425/4860 [17:06:40<57:07, 7.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|█████████████████████████████████████████████████████████████████████▏ | 4425/4860 [17:06:40<57:07, 7.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed40<57:07, 7.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed40<57:07, 7.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed40<57:07, 7.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed40<57:07, 7.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed40<57:07, 7.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed40<57:07, 7.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed40<57:07, 7.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed40<57:07, 7.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed40<57:07, 7.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed40<57:07, 7.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed40<57:07, 7.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed40<57:07, 7.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.6892, 'learning_rate': 3.0275229357798168e-06, 'epoch': 2.73} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed40<57:07, 7.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed40<57:07, 7.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed40<57:07, 7.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed40<57:07, 7.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed40<57:07, 7.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed40<57:07, 7.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed40<57:07, 7.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed40<57:07, 7.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed40<57:07, 7.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed40<57:07, 7.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed40<57:07, 7.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed40<57:07, 7.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed40<57:07, 7.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed40<57:07, 7.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed40<57:07, 7.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed40<57:07, 7.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed40<57:07, 7.88s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|█████████████████████████████████████████████████████████████████████▎ | 4432/4860 [17:07:29<50:25, 7.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed29<50:25, 7.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed29<50:25, 7.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed29<50:25, 7.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed29<50:25, 7.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed29<50:25, 7.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed29<50:25, 7.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed29<50:25, 7.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed29<50:25, 7.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed29<50:25, 7.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed29<50:25, 7.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed29<50:25, 7.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed29<50:25, 7.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed29<50:25, 7.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed29<50:25, 7.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed29<50:25, 7.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed29<50:25, 7.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed29<50:25, 7.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed29<50:25, 7.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed29<50:25, 7.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed29<50:25, 7.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed29<50:25, 7.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed29<50:25, 7.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed29<50:25, 7.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed29<50:25, 7.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed29<50:25, 7.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed29<50:25, 7.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed29<50:25, 7.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed29<50:25, 7.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed29<50:25, 7.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed29<50:25, 7.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed29<50:25, 7.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.2495, 'learning_rate': 2.944954128440367e-06, 'epoch': 2.74} +Could not estimate the number of tokens of the input, floating-point operations will not be computed29<50:25, 7.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed29<50:25, 7.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed29<50:25, 7.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed29<50:25, 7.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed29<50:25, 7.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed29<50:25, 7.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed29<50:25, 7.07s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|█████████████████████████████████████████████████████████████████████▍ | 4441/4860 [17:08:32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 91%|█████████████���███████████████████████████████████████████████████████▍ | 4441/4860 [17:08:32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.4565, 'learning_rate': 2.931192660550459e-06, 'epoch': 2.74} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.2403, 'learning_rate': 2.917431192660551e-06, 'epoch': 2.74} +Could not estimate the number of tokens of the input, floating-point operations will not be computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.4559, 'learning_rate': 2.9105504587155965e-06, 'epoch': 2.74} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.1723, 'learning_rate': 2.9036697247706426e-06, 'epoch': 2.74} +Could not estimate the number of tokens of the input, floating-point operations will not be computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.1987, 'learning_rate': 2.896788990825688e-06, 'epoch': 2.75} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.0769, 'learning_rate': 2.8899082568807342e-06, 'epoch': 2.75} +Could not estimate the number of tokens of the input, floating-point operations will not be computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.0708, 'learning_rate': 2.88302752293578e-06, 'epoch': 2.75} +Could not estimate the number of tokens of the input, floating-point operations will not be computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.1007, 'learning_rate': 2.876146788990826e-06, 'epoch': 2.75} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.1918, 'learning_rate': 2.8692660550458715e-06, 'epoch': 2.75} +Could not estimate the number of tokens of the input, floating-point operations will not be computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.1238, 'learning_rate': 2.8623853211009175e-06, 'epoch': 2.75} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.1553, 'learning_rate': 2.855504587155963e-06, 'epoch': 2.75} +Could not estimate the number of tokens of the input, floating-point operations will not be computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.2297, 'learning_rate': 2.848623853211009e-06, 'epoch': 2.75} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed32<59:14, 8.48s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|███████████████████████████████████████████████████████████████████▊ | 4455/4860 [17:11:11<1:09:50, 10.35s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [17:11:11<1:09:50, 10.35s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [17:11:11<1:09:50, 10.35s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:09:50, 10.35s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:50, 10.35s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:50, 10.35s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|███████████████████████████████████████████████████████████████████▊ | 4456/4860 [17:11:21<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [17:11:21<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [17:11:21<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.9726, 'learning_rate': 2.7935779816513763e-06, 'epoch': 2.75} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.8671, 'learning_rate': 2.786697247706422e-06, 'epoch': 2.75} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.9898, 'learning_rate': 2.772935779816514e-06, 'epoch': 2.76} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.9593, 'learning_rate': 2.76605504587156e-06, 'epoch': 2.76} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.0171, 'learning_rate': 2.7591743119266056e-06, 'epoch': 2.76} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.928, 'learning_rate': 2.7522935779816517e-06, 'epoch': 2.76} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.0607, 'learning_rate': 2.7454128440366973e-06, 'epoch': 2.76} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.9178, 'learning_rate': 2.7385321100917433e-06, 'epoch': 2.76} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.9014, 'learning_rate': 2.731651376146789e-06, 'epoch': 2.76} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.7439, 'learning_rate': 2.7110091743119267e-06, 'epoch': 2.76} +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:09:22, 10.30s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████ | 4479/4860 [17:14:33<46:35, 7.34s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████ | 4479/4860 [17:14:33<46:35, 7.34s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed33<46:35, 7.34s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed33<46:35, 7.34s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed33<46:35, 7.34s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed33<46:35, 7.34s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed33<46:35, 7.34s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed33<46:35, 7.34s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed33<46:35, 7.34s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed33<46:35, 7.34s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed33<46:35, 7.34s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.6463, 'learning_rate': 2.6628440366972477e-06, 'epoch': 2.77} +Could not estimate the number of tokens of the input, floating-point operations will not be computed33<46:35, 7.34s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed33<46:35, 7.34s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed33<46:35, 7.34s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed33<46:35, 7.34s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed33<46:35, 7.34s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed33<46:35, 7.34s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed33<46:35, 7.34s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed33<46:35, 7.34s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed33<46:35, 7.34s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed33<46:35, 7.34s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed33<46:35, 7.34s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed33<46:35, 7.34s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed33<46:35, 7.34s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed33<46:35, 7.34s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed33<46:35, 7.34s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed33<46:35, 7.34s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed33<46:35, 7.34s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed33<46:35, 7.34s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed33<46:35, 7.34s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed33<46:35, 7.34s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.3782, 'learning_rate': 2.6284403669724775e-06, 'epoch': 2.77} +Could not estimate the number of tokens of the input, floating-point operations will not be computed33<46:35, 7.34s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed33<46:35, 7.34s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed33<46:35, 7.34s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed33<46:35, 7.34s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed33<46:35, 7.34s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▏ | 4488/4860 [17:15:31<38:03, 6.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|██████████████████████████████████████████████████████████████████████▏ | 4488/4860 [17:15:31<38:03, 6.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed31<38:03, 6.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed31<38:03, 6.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed31<38:03, 6.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.0852, 'learning_rate': 2.6077981651376147e-06, 'epoch': 2.77} +Could not estimate the number of tokens of the input, floating-point operations will not be computed31<38:03, 6.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed31<38:03, 6.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed31<38:03, 6.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.2689, 'learning_rate': 2.600917431192661e-06, 'epoch': 2.77} +Could not estimate the number of tokens of the input, floating-point operations will not be computed31<38:03, 6.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed31<38:03, 6.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed31<38:03, 6.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed31<38:03, 6.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed31<38:03, 6.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed31<38:03, 6.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed31<38:03, 6.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed31<38:03, 6.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed31<38:03, 6.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed31<38:03, 6.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed31<38:03, 6.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed31<38:03, 6.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed31<38:03, 6.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed31<38:03, 6.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed31<38:03, 6.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed31<38:03, 6.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed31<38:03, 6.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed31<38:03, 6.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed31<38:03, 6.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed31<38:03, 6.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed31<38:03, 6.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed31<38:03, 6.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed31<38:03, 6.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed31<38:03, 6.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed31<38:03, 6.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed31<38:03, 6.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed31<38:03, 6.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed31<38:03, 6.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed31<38:03, 6.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed31<38:03, 6.14s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 92%|████████████████████████████████████████████████████████████████████▍ | 4494/4860 [17:16:37<1:10:26, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:10:26, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:10:26, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:10:26, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:10:26, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:10:26, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:10:26, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.1764, 'learning_rate': 2.5665137614678897e-06, 'epoch': 2.77} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:10:26, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:10:26, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:10:26, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:10:26, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:10:26, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:10:26, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.1951, 'learning_rate': 2.559633027522936e-06, 'epoch': 2.78} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:10:26, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:10:26, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:10:26, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:10:26, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:10:26, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:10:26, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:10:26, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:10:26, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:10:26, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:10:26, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:10:26, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:10:26, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.1474, 'learning_rate': 2.545871559633028e-06, 'epoch': 2.78} +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:10:26, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:10:26, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:10:26, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:10:26, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:10:26, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:10:26, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:10:26, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:10:26, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +Could not estimate the number of tokens of the input, floating-point operations will not be computed<1:10:26, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:10:26, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:10:26, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed<1:10:26, 11.55s/it]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|██████████████████████████████████████████████���█████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|███████████████████████��████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [17:17:45<1:07:51, 11.31s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|█��█████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|████���██▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 9%|███████▌ | 174/1845 [04:00<37:47, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 14%|███████████▌ | 266/1845 [06:09<37:55, 1.44s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|█████████��██████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 21%|████████████████▊ | 389/1845 [09:00<31:12, 1.29s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████��███████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 32%|█████████████████████████▉ | 598/1845 [13:46<28:24, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|█████████████��████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|���█████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|████████████████████���█████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 38%|██████████████████████████████▏ | 697/1845 [16:03<27:50, 1.45s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████��███▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████�� | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 56%|███████████████████████████████████████████▊ | 1024/1845 [23:33<19:09, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|██████████████████████████���████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████��███████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 65%|███████████████████████████████████████████████████▍ | 1200/1845 [27:32<14:08, 1.32s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 72%|█████████████████████████████████████████████████████████▏ | 1335/1845 [30:38<11:41, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 72%|█████████████████████████████████████████████████████████▏ | 1335/1845 [30:38<11:41, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 72%|█████████████████████████████████████████████████████████▏ | 1335/1845 [30:38<11:41, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 72%|█████████████████████████████████████████████████████████▏ | 1335/1845 [30:38<11:41, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 72%|█████████████████████████████████████████████████████████▏ | 1335/1845 [30:38<11:41, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 72%|█████████████████████████████████████████████████████████▏ | 1335/1845 [30:38<11:41, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 72%|█████████████████████████████████████████████████████████▏ | 1335/1845 [30:38<11:41, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 72%|█████████████████████████████████████████████████████████▏ | 1335/1845 [30:38<11:41, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 72%|█████████████████████████████████████████████████████████▏ | 1335/1845 [30:38<11:41, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 72%|█████████████████████████████████████████████████████████▏ | 1335/1845 [30:38<11:41, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 72%|█████████████████████████████████████████████████████████▏ | 1335/1845 [30:38<11:41, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 72%|█████████████████████████████████████████████████████████▏ | 1335/1845 [30:38<11:41, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 72%|█████████████████████████████████████████████████████████▏ | 1335/1845 [30:38<11:41, 1.37s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 73%|██████████████████████████████████████████████████████████ | 1355/1845 [31:04<10:14, 1.25s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 73%|██████████████████████████████████████████████████████████ | 1355/1845 [31:04<10:14, 1.25s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 73%|██████████████████████████████████████████████████████████ | 1355/1845 [31:04<10:14, 1.25s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 73%|██████████████████████████████████████████████████████████ | 1355/1845 [31:04<10:14, 1.25s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 73%|██████████████████████████████████████████████████████████ | 1355/1845 [31:04<10:14, 1.25s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 73%|██████████████████████████████████████████████████████████ | 1355/1845 [31:04<10:14, 1.25s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 73%|██████████████████████████████████████████████████████████ | 1355/1845 [31:04<10:14, 1.25s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 73%|██████████████████████████████████████████████████████████ | 1355/1845 [31:04<10:14, 1.25s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 73%|██████████████████████████████████████████████████████████ | 1355/1845 [31:04<10:14, 1.25s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 73%|██████████████████████████████████████████████████████████ | 1355/1845 [31:04<10:14, 1.25s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 73%|██████████████████████████████████████████████████████████ | 1355/1845 [31:04<10:14, 1.25s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 73%|██████████████████████████████████████████████████████████ | 1355/1845 [31:04<10:14, 1.25s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 73%|██████████████████████████████████████████████████████████ | 1355/1845 [31:04<10:14, 1.25s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 73%|██████████████████████████████████████████████████████████ | 1355/1845 [31:04<10:14, 1.25s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 73%|██████████████████████████████████████████████████████████ | 1355/1845 [31:04<10:14, 1.25s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 73%|██████████████████████████████████████████████████████████ | 1355/1845 [31:04<10:14, 1.25s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 73%|██████████████████████████████████████████████████████████ | 1355/1845 [31:04<10:14, 1.25s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 73%|██████████████████████████████████████████████████████████ | 1355/1845 [31:04<10:14, 1.25s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 73%|██████████████████████████████████████████████████████████ | 1355/1845 [31:04<10:14, 1.25s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|██████████████████████████████████���████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████��███████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 75%|███████████████████████████████████████████████████████████▎ | 1384/1845 [31:42<10:25, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▍ | 1553/1845 [35:32<06:48, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▍ | 1553/1845 [35:32<06:48, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▍ | 1553/1845 [35:32<06:48, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▍ | 1553/1845 [35:32<06:48, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▍ | 1553/1845 [35:32<06:48, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▍ | 1553/1845 [35:32<06:48, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▍ | 1553/1845 [35:32<06:48, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▍ | 1553/1845 [35:32<06:48, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▍ | 1553/1845 [35:32<06:48, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▍ | 1553/1845 [35:32<06:48, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▍ | 1553/1845 [35:32<06:48, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▍ | 1553/1845 [35:32<06:48, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▍ | 1553/1845 [35:32<06:48, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▍ | 1553/1845 [35:32<06:48, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▍ | 1553/1845 [35:32<06:48, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▍ | 1553/1845 [35:32<06:48, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▍ | 1553/1845 [35:32<06:48, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▍ | 1553/1845 [35:32<06:48, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▍ | 1553/1845 [35:32<06:48, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▍ | 1553/1845 [35:32<06:48, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▍ | 1553/1845 [35:32<06:48, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▍ | 1553/1845 [35:32<06:48, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▍ | 1553/1845 [35:32<06:48, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▍ | 1553/1845 [35:32<06:48, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▍ | 1553/1845 [35:32<06:48, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▍ | 1553/1845 [35:32<06:48, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▍ | 1553/1845 [35:32<06:48, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▍ | 1553/1845 [35:32<06:48, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▍ | 1553/1845 [35:32<06:48, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▍ | 1553/1845 [35:32<06:48, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▍ | 1553/1845 [35:32<06:48, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▍ | 1553/1845 [35:32<06:48, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▍ | 1553/1845 [35:32<06:48, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▍ | 1553/1845 [35:32<06:48, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▍ | 1553/1845 [35:32<06:48, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▍ | 1553/1845 [35:32<06:48, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▍ | 1553/1845 [35:32<06:48, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▍ | 1553/1845 [35:32<06:48, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▍ | 1553/1845 [35:32<06:48, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▍ | 1553/1845 [35:32<06:48, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████���███████████████████████████████▍ | 1553/1845 [35:32<06:48, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▍ | 1553/1845 [35:32<06:48, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▍ | 1553/1845 [35:32<06:48, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▍ | 1553/1845 [35:32<06:48, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▍ | 1553/1845 [35:32<06:48, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▍ | 1553/1845 [35:32<06:48, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▍ | 1553/1845 [35:32<06:48, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▍ | 1553/1845 [35:32<06:48, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▍ | 1553/1845 [35:32<06:48, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 84%|██████████████████████████████████████████████████████████████████▍ | 1553/1845 [35:32<06:48, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 88%|█████████████████████████████████████████████████████████████████████▋ | 1627/1845 [37:12<04:57, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|█████████████████████████████████████████████████████████████████████████▌ | 1717/1845 [39:13<02:54, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|█████████████████████████████████████████████████████████████████████████▌ | 1717/1845 [39:13<02:54, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|█████████████████████████████████████████████████████████████████████████▌ | 1717/1845 [39:13<02:54, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|█████████████████████████████████████████████████████████████████████████▌ | 1717/1845 [39:13<02:54, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|█████████████████████████████████████████████████████████████████████████▌ | 1717/1845 [39:13<02:54, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|█████████████████████████████████████████████████████████████████████████▌ | 1717/1845 [39:13<02:54, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|█████████████████████████████████████████████████████████████████████████▌ | 1717/1845 [39:13<02:54, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|█████████████████████████████████████████████████████████████████████████▌ | 1717/1845 [39:13<02:54, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|█████████████████████████████████████████████████████████████████████████▌ | 1717/1845 [39:13<02:54, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|█████████████████████████████████████████████████████████████████████████▌ | 1717/1845 [39:13<02:54, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|█████████████████████████████████████████████████████████████████████████▌ | 1717/1845 [39:13<02:54, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|█████████████████████████████████████████████████████████████████████████▌ | 1717/1845 [39:13<02:54, 1.36s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|█████████████████████████████████████████████████████��████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|███████████████████████████████████████████████████████████��██████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|█████████████████████████████████████████████████████████████████��████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|███████████████████████████████████████████████████████████████████████��██▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 94%|██████████████████████████████████████████████████████████████████████████▎ | 1735/1845 [39:38<02:33, 1.40s/it]The following columns in the evaluation set don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [18:00:01<1:07:51, 11.31s/it]Saving model checkpoint to ./checkpoint-4500 don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. + 93%|████████████████████████████████████████████████████████████████████▌ | 4500/4860 [18:00:01<1:07:51, 11.31s/it]Saving model checkpoint to ./checkpoint-4500 don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Model weights saved in ./checkpoint-4500/pytorch_model.bin███████████████▌ | 4500/4860 [18:00:01<1:07:51, 11.31s/it]Saving model checkpoint to ./checkpoint-4500 don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Feature extractor saved in ./preprocessor_config.jsonl.bin███████████████▌ | 4500/4860 [18:00:01<1:07:51, 11.31s/it]Saving model checkpoint to ./checkpoint-4500 don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Adding files tracked by Git LFS: ['wandb/run-20220505_173748-b097rk18/logs/debug-internal.log']. This may take a bit of time if the files are large./checkpoint-4500 don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message. +Adding files tracked by Git LFS: ['wandb/run-20220505_173748-b097rk18/logs/debug-internal.log']. This may take a bit of time if the files are large./checkpoint-4500 don't have a corresponding argument in `SpeechEncoderDecoderModel.forward` and have been ignored: length, lang. If length, lang are not expected by `SpeechEncoderDecoderModel.forward`, you can safely ignore this message.