SpeechTek/EE-Wav2Vec2 · Hugging Face

Model Details

Model Description

Wav2Vec2.0 model trained with Early-Exit pipeline.

Developed by: SpeectTek unit, Fondazione Bruno Kessler
Model type: Wav2Vec 2.0
Language(s) (NLP): English
Finetuned from model: facebook/wav2vec2-base-960h
Repository: https://github.com/augustgw/wav2vec2-ee
Paper: Training early-exit architectures for automatic speech recognition: Fine-tuning pre-trained models or training from scratch

Downstream Use [optional]

The model is trained for computationally efficient ASR tasks.

Training Details

Training Data

The model is trained using the LibriSpeech-960h dataset.

Training Procedure

Basic training

Fine-tuning with only EE loss: finetune_ee.py
Fine-tuning a model without early exits: finetune_non-ee.py
Change model_config = Wav2Vec2Config(num_hidden_layers=X) to set the number of layers in the encoder. E.g., for 4-layer encoder: model_config = Wav2Vec2Config(num_hidden_layers=4)

Training Hyperparameters

training_args = TrainingArguments( output_dir="./wav2vec2-ee/checkpoints/", evaluation_strategy="no", #eval_steps=1000, save_strategy = 'epoch', #eval_accumulation_steps=10, learning_rate=1e-4, per_device_train_batch_size=16, per_device_eval_batch_size=1, num_train_epochs=100, weight_decay=0.01, push_to_hub=False, report_to='wandb', logging_strategy='steps', logging_steps=1000, dataloader_num_workers=1, ignore_data_skip=True,)

Evaluation

The evaluation scripts create files in the indicated output directory. wer_results.txt contains the layerwise WERs on the test sets indicated in the evaluation script. The remaining files contain the layerwise transcriptions of each item in each test set.

Basic evaluation

Normal evaluation: eval.py path/to/model/checkpoint path/to/output/directory
- For safetensors checkpoints saved by newer versions of Hugging Face, see note in eval.py
Evaluation for models without early exits (evaluates only output of final layer): eval_non-ee.py path/to/model/checkpoint path/to/output/directory

Results

Exit	Test-Clean	Dev-Clean
Exit(1)	19.14	19.06
Exit(2)	8.26	8.01
Exit(3)	5.93	5.57
Exit(4)	4.74	4.48
Exit(5)	3.98	3.79
Exit(6)	3.95	3.69

Citation

@inproceedings{wright2024training,
  title={Training early-exit architectures for automatic speech recognition: Fine-tuning pre-trained models or training from scratch},
  author={Wright, George August and Cappellazzo, Umberto and Zaiem, Salah and Raj, Desh and Yang, Lucas Ondel and Falavigna, Daniele and Ali, Mohamed Nabih and Brutti, Alessio},
  booktitle={2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)},
  pages={685--689},
  year={2024},
  organization={IEEE}
}

SpeechTek
/

EE-Wav2Vec2