--- license: cc-by-2.0 datasets: - openslr/librispeech_asr language: - en metrics: - wer base_model: - facebook/wav2vec2-base-960h pipeline_tag: automatic-speech-recognition library_name: transformers --- ## Model Details ### Model Description Wav2Vec2.0 model trained with Early-Exit pipeline. - **Developed by:** SpeectTek unit, Fondazione Bruno Kessler - **Model type:** Wav2Vec 2.0 - **Language(s) (NLP):** English - **Finetuned from model:** facebook/wav2vec2-base-960h - **Repository:** https://github.com/augustgw/wav2vec2-ee - **Paper:** Training early-exit architectures for automatic speech recognition: Fine-tuning pre-trained models or training from scratch ### Downstream Use [optional] The model is trained for computationally efficient ASR tasks. ## Training Details ### Training Data The model is trained using the LibriSpeech-960h dataset. ### Training Procedure ### Basic training - Fine-tuning with only EE loss: `finetune_ee.py` - Fine-tuning a model without early exits: `finetune_non-ee.py` - Change `model_config = Wav2Vec2Config(num_hidden_layers=X)` to set the number of layers in the encoder. E.g., for 4-layer encoder: `model_config = Wav2Vec2Config(num_hidden_layers=4)` #### Training Hyperparameters `training_args = TrainingArguments( output_dir="./wav2vec2-ee/checkpoints/", evaluation_strategy="no", #eval_steps=1000, save_strategy = 'epoch', #eval_accumulation_steps=10, learning_rate=1e-4, per_device_train_batch_size=16, per_device_eval_batch_size=1, num_train_epochs=100, weight_decay=0.01, push_to_hub=False, report_to='wandb', logging_strategy='steps', logging_steps=1000, dataloader_num_workers=1, ignore_data_skip=True,) ` ## Evaluation The evaluation scripts create files in the indicated output directory. `wer_results.txt` contains the layerwise WERs on the test sets indicated in the evaluation script. The remaining files contain the layerwise transcriptions of each item in each test set. ### Basic evaluation - Normal evaluation: `eval.py path/to/model/checkpoint path/to/output/directory` - For safetensors checkpoints saved by newer versions of Hugging Face, see note in `eval.py` - Evaluation for models without early exits (evaluates only output of final layer): `eval_non-ee.py path/to/model/checkpoint path/to/output/directory` ### Results | Exit | Test-Clean | Dev-Clean | |--------|------------|-----------| | Exit(1)| 19.14 | 19.06 | | Exit(2)| 8.26 | 8.01 | | Exit(3)| 5.93 | 5.57 | | Exit(4)| 4.74 | 4.48 | | Exit(5)| 3.98 | 3.79 | | Exit(6)| 3.95 | 3.69 | ## Citation ``` @inproceedings{wright2024training, title={Training early-exit architectures for automatic speech recognition: Fine-tuning pre-trained models or training from scratch}, author={Wright, George August and Cappellazzo, Umberto and Zaiem, Salah and Raj, Desh and Yang, Lucas Ondel and Falavigna, Daniele and Ali, Mohamed Nabih and Brutti, Alessio}, booktitle={2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)}, pages={685--689}, year={2024}, organization={IEEE} } ```