techiaith
/

wav2vec2-base-cy

Inference Endpoints

Model card Files Files and versions

wav2vec2-base-cy / README.md

DewiBrynJones's picture

Approx. 4000 hours YT data

bc79fa0 4 months ago

|

No virus

1.16 kB

	---
	license: apache-2.0
	language:
	- cy
	tags:
	- speech
	---

	# Pre-training wav2vec2 models for Welsh speech recognition

	At the moment, the best Welsh speech recognition models are achieved from fine-tuning https://huggingface.co/facebook/wav2vec2-large-xlsr-53 and https://huggingface.co/facebook/wav2vec2-xls-r-1b models by Facebook/Meta AI.

	This model is experimental in investigating pretraining better models with more Welsh language speech that could lower WER scores even further in subsequently fine-tuned models. The work draws heavily on resources and documentation from the HuggingFace examples:

	https://github.com/huggingface/transformers/tree/main/examples/pytorch/speech-pretraining

	This base model has been pre-trained with only approximately 4000 hours of Welsh and English speech collected from various channels on YouTube. The corpus contains only 25% Welsh language speech. English language speech contains Welsh-accented English speech and therefore has been retained for pre-training.

	Until we have collected many more hours of speech, this pre-trained model will be of limited use for fine-tuning any useful downstream tasks.