wav2vec2-large-xls-r-300m-telugu

This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m on the openslr dataset. It achieves the following results on the evaluation set:

Loss: 0.2464
Wer: 0.3412

Model description

wav2vec2-large-xls-r-300m-telugu is a speech recognition model fine-tuned on the Telugu language using the Wav2Vec2 architecture. It uses the pre-trained XLS-R 300M model, which is designed to learn speech representations from raw audio in a variety of languages.

Wav2Vec2 models are capable of learning strong acoustic representations and have shown state-of-the-art performance for ASR tasks across multiple languages, including low-resource languages like Telugu.

Intended uses & limitations

Intended uses: Automatic Speech Recognition (ASR) for Telugu language. Can be used in voice assistants, transcription services, or other applications that require Telugu speech-to-text conversion.

Limitations: The model is trained on specific datasets (OpenSLR SLR66) and may not generalize perfectly to all varieties of Telugu. Performance may degrade for noisy audio or different dialects/accents that were not well-represented in the training data. The model outputs unpunctuated, lower-cased transcriptions.

Training and evaluation data

This model was fine-tuned on the OpenSLR SLR66 dataset, which contains hours of spoken Telugu. The dataset is used for speech recognition tasks in Telugu and is one of the standard datasets used for ASR in low-resource languages.

Dataset Name: OpenSLR Dataset Config: SLR66 Dataset Split: Train Task Type: Automatic Speech Recognition Language: Telugu (te)

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 16
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
num_epochs: 30
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
6.0648	3.5874	400	0.9576	0.8553
0.5387	7.1749	800	0.3074	0.5044
0.2438	10.7623	1200	0.2602	0.3833
0.1551	14.3498	1600	0.2547	0.3872
0.116	17.9372	2000	0.2623	0.3844
0.0864	21.5247	2400	0.2482	0.3617
0.0673	25.1121	2800	0.2471	0.3459
0.0535	28.6996	3200	0.2464	0.3412

Framework versions

Transformers 4.44.2
Pytorch 2.4.1+cu121
Datasets 2.21.0
Tokenizers 0.19.1

kaarthu2003
/

wav2vec2-large-xls-r-300m-telugu