Model Details

Model Description

Wav2Vec2.0 model trained with Early-Exit pipeline.

  • Developed by: SpeectTek unit, Fondazione Bruno Kessler
  • Model type: Wav2Vec 2.0
  • Language(s) (NLP): English
  • Finetuned from model: facebook/wav2vec2-base-960h
  • Repository: https://github.com/augustgw/wav2vec2-ee
  • Paper: Training early-exit architectures for automatic speech recognition: Fine-tuning pre-trained models or training from scratch

Downstream Use [optional]

The model is trained for computationally efficient ASR tasks.

Training Details

Training Data

The model is trained using the LibriSpeech-960h dataset.

Training Procedure

Basic training

  • Fine-tuning with only EE loss: finetune_ee.py
  • Fine-tuning a model without early exits: finetune_non-ee.py
  • Change model_config = Wav2Vec2Config(num_hidden_layers=X) to set the number of layers in the encoder. E.g., for 4-layer encoder: model_config = Wav2Vec2Config(num_hidden_layers=4)

Training Hyperparameters

training_args = TrainingArguments( output_dir="./wav2vec2-ee/checkpoints/", evaluation_strategy="no", #eval_steps=1000, save_strategy = 'epoch', #eval_accumulation_steps=10, learning_rate=1e-4, per_device_train_batch_size=16, per_device_eval_batch_size=1, num_train_epochs=100, weight_decay=0.01, push_to_hub=False, report_to='wandb', logging_strategy='steps', logging_steps=1000, dataloader_num_workers=1, ignore_data_skip=True,)

Evaluation

The evaluation scripts create files in the indicated output directory. wer_results.txt contains the layerwise WERs on the test sets indicated in the evaluation script. The remaining files contain the layerwise transcriptions of each item in each test set.

Basic evaluation

  • Normal evaluation: eval.py path/to/model/checkpoint path/to/output/directory
    • For safetensors checkpoints saved by newer versions of Hugging Face, see note in eval.py
  • Evaluation for models without early exits (evaluates only output of final layer): eval_non-ee.py path/to/model/checkpoint path/to/output/directory

Results

Exit Test-Clean Dev-Clean
Exit(1) 19.14 19.06
Exit(2) 8.26 8.01
Exit(3) 5.93 5.57
Exit(4) 4.74 4.48
Exit(5) 3.98 3.79
Exit(6) 3.95 3.69

Citation

@inproceedings{wright2024training,
  title={Training early-exit architectures for automatic speech recognition: Fine-tuning pre-trained models or training from scratch},
  author={Wright, George August and Cappellazzo, Umberto and Zaiem, Salah and Raj, Desh and Yang, Lucas Ondel and Falavigna, Daniele and Ali, Mohamed Nabih and Brutti, Alessio},
  booktitle={2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)},
  pages={685--689},
  year={2024},
  organization={IEEE}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for SpeechTek/EE-Wav2Vec2

Finetuned
(123)
this model

Dataset used to train SpeechTek/EE-Wav2Vec2