nohup: ignoring input wandb: Currently logged in as: sanchit-gandhi. Use `wandb login --relogin` to force relogin wandb: Tracking run with wandb version 0.12.17 wandb: Run data is saved locally in /home/sanchitgandhi/train-flax-wav2vec2-2-bart-large-cv9-feature-encoder/wandb/run-20220531_135814-146ecm8l wandb: Run `wandb offline` to turn off syncing. wandb: Syncing run flax-wav2vec2-2-bart-large-cv9-feature-encoder wandb: ⭐️ View project at https://wandb.ai/sanchit-gandhi/commonvoice_9_0 wandb: 🚀 View run at https://wandb.ai/sanchit-gandhi/commonvoice_9_0/runs/146ecm8l 05/31/2022 13:58:20 - INFO - __main__ - Training/evaluation parameters FlaxSeq2SeqTrainingArguments( _n_gpu=-1, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, bf16=False, bf16_full_eval=False, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_pin_memory=True, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, debug=, deepspeed=None, disable_tqdm=None, do_eval=True, do_predict=True, do_train=True, eval_accumulation_steps=None, eval_delay=0, eval_steps=10000, evaluation_strategy=no, final_generation_max_length=200, final_generation_num_beams=5, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, generation_length_penalty=1.2, generation_max_length=40, generation_num_beams=1, gradient_accumulation_steps=1, gradient_checkpointing=True, greater_is_better=None, group_by_length=False, half_precision_backend=auto, hub_model_id=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=input_length, load_best_model_at_end=False, local_rank=-1, log_level=passive, log_level_replica=passive, log_on_each_node=True, logging_dir=None, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=25, logging_strategy=steps, lr_scheduler_type=linear, matmul_precision=default, max_grad_norm=1.0, max_steps=50000, metric_for_best_model=None, mp_parameters=, no_cuda=False, num_train_epochs=3.0, optim=adamw_hf, output_dir=./flax-wav2vec2-2-bart-large-cv9-feature-encoder, overwrite_output_dir=True, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, precision=full, predict_with_generate=True, prediction_loss_only=False, push_to_hub=True, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, remove_unused_columns=True, report_to=None, resume_from_checkpoint=None, run_name=None, save_on_each_node=False, save_steps=10000, save_strategy=steps, save_total_limit=1, seed=42, sharded_ddp=, skip_memory_metrics=True, sortish_sampler=False, tf32=None, tpu_metrics_debug=False, tpu_num_cores=None, use_legacy_prediction_loop=False, warmup_ratio=0.0, warmup_steps=500, weight_decay=0.0, xpu_backend=None, ) 05/31/2022 13:58:20 - INFO - __main__ - JAX devices: 8, matmul precision: default 05/31/2022 13:58:22 - WARNING - datasets.builder - Reusing dataset common_voice_9_0 (/home/sanchitgandhi/cache/huggingface/datasets/mozilla-foundation___common_voice_9_0/en/9.0.0/26f54721b57ee2f31a333b315ed9151fbd8e693a3983c295fef63c67a12b9bf7) 05/31/2022 13:58:24 - WARNING - datasets.builder - Reusing dataset common_voice_9_0 (/home/sanchitgandhi/cache/huggingface/datasets/mozilla-foundation___common_voice_9_0/en/9.0.0/26f54721b57ee2f31a333b315ed9151fbd8e693a3983c295fef63c67a12b9bf7) 05/31/2022 13:58:26 - WARNING - datasets.builder - Reusing dataset common_voice_9_0 (/home/sanchitgandhi/cache/huggingface/datasets/mozilla-foundation___common_voice_9_0/en/9.0.0/26f54721b57ee2f31a333b315ed9151fbd8e693a3983c295fef63c67a12b9bf7) loading configuration file https://huggingface.co/sanchit-gandhi/flax-wav2vec2-2-bart-large-scan/resolve/main/config.json from cache at /home/sanchitgandhi/.cache/huggingface/transformers/e6d3af8a2b6624d8adf8fc289717c121400164223b3e51d49b639aa34d1d3048.c9a58c9120361b7b034a0136cc74d5dce009e745c4cc111c255d5f3d0a9e2fd9 /home/sanchitgandhi/transformers/src/transformers/configuration_utils.py:358: UserWarning: Passing `gradient_checkpointing` to a config initialization is deprecated and will be removed in v5 Transformers. Using `model.gradient_checkpointing_enable()` instead, or if you are using the `Trainer` API, pass `gradient_checkpointing=True` in your `TrainingArguments`. warnings.warn( Model config SpeechEncoderDecoderConfig { "_name_or_path": "sanchit-gandhi/flax-wav2vec2-2-bart-large-scan", "architectures": [ "SpeechEncoderDecoderModel" ], "decoder": { "_name_or_path": "", "activation_dropout": 0.1, "activation_function": "gelu", "add_bias_logits": false, "add_cross_attention": true, "add_final_layer_norm": false, "architectures": [ "BartModel" ], "attention_dropout": 0.1, "bad_words_ids": null, "bos_token_id": 0, "chunk_size_feed_forward": 0, "classif_dropout": 0.1, "classifier_dropout": 0.0, "cross_attention_hidden_size": null, "d_model": 1024, "decoder_attention_heads": 16, "decoder_ffn_dim": 4096, "decoder_layerdrop": 0.0, "decoder_layers": 12, "decoder_start_token_id": 2, "diversity_penalty": 0.0, "do_sample": false, "dropout": 0.1, "early_stopping": true, "encoder_attention_heads": 16, "encoder_ffn_dim": 4096, "encoder_layerdrop": 0.0, "encoder_layers": 12, "encoder_no_repeat_ngram_size": 0, "eos_token_id": 2, "exponential_decay_length_penalty": null, "finetuning_task": null, "forced_bos_token_id": 0, "forced_eos_token_id": 2, "fuse_matmuls": false, "gradient_checkpointing": true, "id2label": { "0": "LABEL_0", "1": "LABEL_1", "2": "LABEL_2" }, "init_std": 0.02, "is_decoder": true, "is_encoder_decoder": false, "label2id": { "LABEL_0": 0, "LABEL_1": 1, "LABEL_2": 2 }, "length_penalty": 1.0, "max_length": 20, "max_position_embeddings": 1024, "min_length": 0, "model_type": "bart", "no_repeat_ngram_size": 3, "normalize_before": false, "num_beam_groups": 1, "num_beams": 4, "num_hidden_layers": 12, "num_return_sequences": 1, "output_attentions": false, "output_hidden_states": false, "output_scores": false, "pad_token_id": 1, "prefix": null, "problem_type": null, "pruned_heads": {}, "remove_invalid_values": false, "repetition_penalty": 1.0, "return_dict": true, "return_dict_in_generate": false, "scale_embedding": false, "sep_token_id": null, "task_specific_params": { "summarization": { "length_penalty": 1.0, "max_length": 128, "min_length": 12, "num_beams": 4 }, "summarization_cnn": { "length_penalty": 2.0, "max_length": 142, "min_length": 56, "num_beams": 4 }, "summarization_xsum": { "length_penalty": 1.0, "max_length": 62, "min_length": 11, "num_beams": 6 } }, "temperature": 1.0, "tie_encoder_decoder": false, "tie_word_embeddings": true, "tokenizer_class": null, "top_k": 50, "top_p": 1.0, "torch_dtype": "float32", "torchscript": false, "transformers_version": "4.18.0.dev0", "typical_p": 1.0, "use_bfloat16": false, "use_cache": true, "use_scan": true, "vocab_size": 50265 }, "decoder_start_token_id": 0, "encoder": { "_name_or_path": "", "activation_dropout": 0.1, "adapter_kernel_size": 3, "adapter_stride": 2, "add_adapter": true, "add_cross_attention": false, "apply_spec_augment": true, "architectures": [ "Wav2Vec2ForPreTraining" ], "attention_dropout": 0.1, "bad_words_ids": null, "bos_token_id": 1, "chunk_size_feed_forward": 0, "classifier_proj_size": 256, "codevector_dim": 768, "contrastive_logits_temperature": 0.1, "conv_bias": true, "conv_dim": [ 512, 512, 512, 512, 512, 512, 512 ], "conv_kernel": [ 10, 3, 3, 3, 3, 2, 2 ], "conv_stride": [ 5, 2, 2, 2, 2, 2, 2 ], "cross_attention_hidden_size": null, "ctc_loss_reduction": "sum", "ctc_zero_infinity": false, "decoder_start_token_id": null, "diversity_loss_weight": 0.1, "diversity_penalty": 0.0, "do_sample": false, "do_stable_layer_norm": true, "early_stopping": false, "encoder_no_repeat_ngram_size": 0, "eos_token_id": 2, "exponential_decay_length_penalty": null, "feat_extract_activation": "gelu", "feat_extract_dropout": 0.0, "feat_extract_norm": "layer", "feat_proj_dropout": 0.0, "feat_quantizer_dropout": 0.0, "final_dropout": 0.0, "finetuning_task": null, "forced_bos_token_id": null, "forced_eos_token_id": null, "fuse_matmuls": false, "gradient_checkpointing": true, "hidden_act": "gelu", "hidden_dropout": 0.1, "hidden_dropout_prob": 0.1, "hidden_size": 1024, "id2label": { "0": "LABEL_0", "1": "LABEL_1" }, "initializer_range": 0.02, "intermediate_size": 4096, "is_decoder": false, "is_encoder_decoder": false, "label2id": { "LABEL_0": 0, "LABEL_1": 1 }, "layer_norm_eps": 1e-05, "layerdrop": 0.0, "length_penalty": 1.0, "mask_feature_length": 10, "mask_feature_min_masks": 0, "mask_feature_prob": 0.0, "mask_time_length": 10, "mask_time_min_masks": 2, "mask_time_prob": 0.1, "max_length": 20, "min_length": 0, "model_type": "wav2vec2", "no_repeat_ngram_size": 0, "num_adapter_layers": 3, "num_attention_heads": 16, "num_beam_groups": 1, "num_beams": 1, "num_codevector_groups": 2, "num_codevectors_per_group": 320, "num_conv_pos_embedding_groups": 16, "num_conv_pos_embeddings": 128, "num_feat_extract_layers": 7, "num_hidden_layers": 24, "num_negatives": 100, "num_return_sequences": 1, "output_attentions": false, "output_hidden_size": 1024, "output_hidden_states": false, "output_scores": false, "pad_token_id": 0, "prefix": null, "problem_type": null, "proj_codevector_dim": 768, "pruned_heads": {}, "remove_invalid_values": false, "repetition_penalty": 1.0, "return_dict": true, "return_dict_in_generate": false, "sep_token_id": null, "task_specific_params": null, "tdnn_dilation": [ 1, 2, 3, 1, 1 ], "tdnn_dim": [ 512, 512, 512, 512, 1500 ], "tdnn_kernel": [ 5, 3, 3, 1, 1 ], "temperature": 1.0, "tie_encoder_decoder": false, "tie_word_embeddings": true, "tokenizer_class": null, "top_k": 50, "top_p": 1.0, "torch_dtype": null, "torchscript": false, "transformers_version": "4.18.0.dev0", "typical_p": 1.0, "use_bfloat16": false, "use_scan": true, "use_weighted_layer_sum": false, "vocab_size": 32, "xvector_output_dim": 512 }, "eos_token_id": 2, "is_encoder_decoder": true, "max_length": 40, "model_type": "speech-encoder-decoder", "pad_token_id": 1, "processor_class": "Wav2Vec2Processor", "tie_word_embeddings": false, "transformers_version": null, "use_cache": false } loading feature extractor configuration file https://huggingface.co/sanchit-gandhi/flax-wav2vec2-2-bart-large-scan/resolve/main/preprocessor_config.json from cache at /home/sanchitgandhi/.cache/huggingface/transformers/bc2232c616201c7d3d66ba3f6a7d1186306134838dfb19786149f0e16122787d.bbc1eb890a39c82e710a893223b8452ac5b78e8b57083b2f893aa7dc59d4ed69 Feature extractor Wav2Vec2FeatureExtractor { "do_normalize": true, "feature_extractor_type": "Wav2Vec2FeatureExtractor", "feature_size": 1, "padding_side": "right", "padding_value": 0.0, "return_attention_mask": true, "sampling_rate": 16000 } loading file https://huggingface.co/sanchit-gandhi/flax-wav2vec2-2-bart-large-scan/resolve/main/vocab.json from cache at /home/sanchitgandhi/.cache/huggingface/transformers/86c0de13925d1534934e540ff4c9dd778f49761b4eaf59dae3335a4f6690a814.bfdcc444ff249bca1a95ca170ec350b442f81804d7df3a95a2252217574121d7 loading file https://huggingface.co/sanchit-gandhi/flax-wav2vec2-2-bart-large-scan/resolve/main/merges.txt from cache at /home/sanchitgandhi/.cache/huggingface/transformers/7cf4fc91891684e1177d1c519689e4c310ebdec965e00d6e45134bb9227ab01b.f5b91da9e34259b8f4d88dbc97c740667a0e8430b96314460cdb04e86d4fc435 loading file https://huggingface.co/sanchit-gandhi/flax-wav2vec2-2-bart-large-scan/resolve/main/tokenizer.json from cache at /home/sanchitgandhi/.cache/huggingface/transformers/c02f3f3009bfacaa24cfead1d0f7fbf4fc2fb5f8092f68703449f02aa3a28e03.393fa6a095aa312a3cce4d5263e471bd94ec0215e6c63448a6464d59ff900814 loading file https://huggingface.co/sanchit-gandhi/flax-wav2vec2-2-bart-large-scan/resolve/main/added_tokens.json from cache at None loading file https://huggingface.co/sanchit-gandhi/flax-wav2vec2-2-bart-large-scan/resolve/main/special_tokens_map.json from cache at /home/sanchitgandhi/.cache/huggingface/transformers/505d61b8f6e05764b5aec1483bfdd13a310681a5af54957263604323be3bbabf.a11ebb04664c067c8fe5ef8f8068b0f721263414a26058692f7b2e4ba2a1b342 loading file https://huggingface.co/sanchit-gandhi/flax-wav2vec2-2-bart-large-scan/resolve/main/tokenizer_config.json from cache at /home/sanchitgandhi/.cache/huggingface/transformers/ff79c23164eac352d7f9651f3c3774a962ce80f81460d9e17d689235fa34ee80.0e8b2b497f91e23302894a5c1f19ced6334b0abd450a7bce75a67bf0f9ee5c54 loading weights file https://huggingface.co/sanchit-gandhi/flax-wav2vec2-2-bart-large-scan/resolve/main/flax_model.msgpack from cache at /home/sanchitgandhi/.cache/huggingface/transformers/1279dc21f7dd9ed546f166e7e445e068b2672ddfa5386b2e3a3a973b8d668365.8e03496bb6919447aeb468483249e7b65dfb59c42989be9787af0aa6aa9b3f50 tcmalloc: large alloc 2353618944 bytes == 0x557c26e0a000 @ 0x7f54eb2b3680 0x7f54eb2d4824 0x557b828ccd4b 0x557b8290d68a 0x557b829e43e8 0x557b8293fb6d 0x557b8281a34f 0x557b829fa18d 0x557b8293ff05 0x557b8289e72f 0x557b82935663 0x557b82936da9 0x557b8289d58e 0x557b82935663 0x557b82936354 0x557b8289cae6 0x557b82935663 0x557b829e245c 0x557b8293645b 0x557b829e250b 0x557b82a12f75 0x557b828b3987 0x557b82a18a2f 0x557b82a1910b 0x557b82a19309 0x7f54eaf490b3 0x557b829a00a0 All model checkpoint weights were used when initializing FlaxSpeechEncoderDecoderModel. All the weights of FlaxSpeechEncoderDecoderModel were initialized from the model checkpoint at sanchit-gandhi/flax-wav2vec2-2-bart-large-scan. If your task is similar to the task the model of the checkpoint was trained on, you can already use FlaxSpeechEncoderDecoderModel for predictions without further training. 05/31/2022 13:59:14 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/sanchitgandhi/cache/huggingface/datasets/mozilla-foundation___common_voice_9_0/en/9.0.0/26f54721b57ee2f31a333b315ed9151fbd8e693a3983c295fef63c67a12b9bf7/cache-fd71c792b3783510.arrow 05/31/2022 13:59:14 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/sanchitgandhi/cache/huggingface/datasets/mozilla-foundation___common_voice_9_0/en/9.0.0/26f54721b57ee2f31a333b315ed9151fbd8e693a3983c295fef63c67a12b9bf7/cache-17bf0fac4bf77ce9.arrow 05/31/2022 13:59:14 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/sanchitgandhi/cache/huggingface/datasets/mozilla-foundation___common_voice_9_0/en/9.0.0/26f54721b57ee2f31a333b315ed9151fbd8e693a3983c295fef63c67a12b9bf7/cache-0583d2ddfeb267a2.arrow 05/31/2022 13:59:14 - WARNING - datasets.fingerprint - Parameter 'function'=.prepare_dataset at 0x7f54d54059d0> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed. preprocess train dataset: 0% 0/890116 [00:00