Wolof ASR Model (Based on Whisper-Small)
Model Overview
This repository hosts an Automatic Speech Recognition (ASR) model for the Wolof language, fine-tuned from Wav2Vce2.0 model. This model aims to provide accurate transcription of Wolof audio data.
Model Details
- Model Base: wav2vec2-xls-r-300m
- Loss: 0.1604
- WER: 0.24
Dataset
The dataset used for training and evaluating this model is a collection from various sources, ensuring a rich and diverse set of Wolof audio samples. The collection is available in my Hugging Face account is used by keeping only the audios with duration shorter than 6 second.
- Training Dataset: 57 hours
- Test Dataset: 10 hours
For detailed information about the dataset, please refer to the M9and2M/Wolof_ASR_dataset.
Training
The training process was adapted from the code in the Finetune Wa2vec 2.0 For Speech Recognition written to fine-tune Wav2Vec2.0 for speech recognition. Special thanks to the author, Duy Khanh, Le for providing a robust and flexible training framework.
The model was trained with the following configuration:
- Seed: 19
- Training Batch Size: 4
- Gradient Accumulation Steps: 8
- Number of GPUs: 2
Optimizer : AdamW
- Learning Rate: 1e-6
Scheduler: OneCycleLR
- Max Learning Rate: 5e-5
Acknowledgements
This model was built using Facebook's Wav2Vec2.0 architecture and fine-tuned with a dataset collected from various sources. Special thanks to the creators and contributors of the dataset.
More Information
This model has been developed in the context of my Master Thesis at ETSIT-UPM, Madrid under the supervision of Prof. Luis A. Hernández Gómez.
Contact
For any inquiries or questions, please contact mamadou.marone@ensea.fr
- Downloads last month
- 12