Pre-training wav2vec2 models for Welsh speech recognition

At the moment, the best Welsh speech recognition models are achieved from fine-tuning and models by Facebook/Meta AI.

This model is experimental in investigating pretraining better models with more Welsh language speech that could lower WER scores even further in subsequently fine-tuned models. The work draws heavily on resources and documentation from the HuggingFace examples:

This base model has been pre-trained with only approximately 4000 hours of Welsh and English speech collected from various channels on YouTube. The corpus contains only 25% Welsh language speech. English language speech contains Welsh-accented English speech and therefore has been retained for pre-training.

Until we have collected many more hours of speech, this pre-trained model will be of limited use for fine-tuning any useful downstream tasks.

Downloads last month
Model size
95.1M params
Tensor type
Unable to determine this model’s pipeline type. Check the docs .