Model Details
Model Description
Wav2Vec2.0 model trained with Early-Exit pipeline.
- Developed by: SpeectTek unit, Fondazione Bruno Kessler
- Model type: Wav2Vec 2.0
- Language(s) (NLP): English
- Finetuned from model: facebook/wav2vec2-base-960h
- Repository: https://github.com/augustgw/wav2vec2-ee
- Paper: Training early-exit architectures for automatic speech recognition: Fine-tuning pre-trained models or training from scratch
Downstream Use [optional]
The model is trained for computationally efficient ASR tasks.
Training Details
Training Data
The model is trained using the LibriSpeech-960h dataset.
Training Procedure
Basic training
- Fine-tuning with only EE loss:
finetune_ee.py
- Fine-tuning a model without early exits:
finetune_non-ee.py
- Change
model_config = Wav2Vec2Config(num_hidden_layers=X)
to set the number of layers in the encoder. E.g., for 4-layer encoder:model_config = Wav2Vec2Config(num_hidden_layers=4)
Training Hyperparameters
training_args = TrainingArguments( output_dir="./wav2vec2-ee/checkpoints/", evaluation_strategy="no", #eval_steps=1000, save_strategy = 'epoch', #eval_accumulation_steps=10, learning_rate=1e-4, per_device_train_batch_size=16, per_device_eval_batch_size=1, num_train_epochs=100, weight_decay=0.01, push_to_hub=False, report_to='wandb', logging_strategy='steps', logging_steps=1000, dataloader_num_workers=1, ignore_data_skip=True,)
Evaluation
The evaluation scripts create files in the indicated output directory. wer_results.txt
contains the layerwise WERs on the test sets indicated in the evaluation script. The remaining files contain the layerwise transcriptions of each item in each test set.
Basic evaluation
- Normal evaluation:
eval.py path/to/model/checkpoint path/to/output/directory
- For safetensors checkpoints saved by newer versions of Hugging Face, see note in
eval.py
- For safetensors checkpoints saved by newer versions of Hugging Face, see note in
- Evaluation for models without early exits (evaluates only output of final layer):
eval_non-ee.py path/to/model/checkpoint path/to/output/directory
Results
Exit | Test-Clean | Dev-Clean |
---|---|---|
Exit(1) | 19.14 | 19.06 |
Exit(2) | 8.26 | 8.01 |
Exit(3) | 5.93 | 5.57 |
Exit(4) | 4.74 | 4.48 |
Exit(5) | 3.98 | 3.79 |
Exit(6) | 3.95 | 3.69 |
Citation
@inproceedings{wright2024training,
title={Training early-exit architectures for automatic speech recognition: Fine-tuning pre-trained models or training from scratch},
author={Wright, George August and Cappellazzo, Umberto and Zaiem, Salah and Raj, Desh and Yang, Lucas Ondel and Falavigna, Daniele and Ali, Mohamed Nabih and Brutti, Alessio},
booktitle={2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)},
pages={685--689},
year={2024},
organization={IEEE}
}
Model tree for SpeechTek/EE-Wav2Vec2
Base model
facebook/wav2vec2-base-960h