Edit model card

wav2vec2-large-xls-r-300m-telugu

This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m on the openslr dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2464
  • Wer: 0.3412

Model description

wav2vec2-large-xls-r-300m-telugu is a speech recognition model fine-tuned on the Telugu language using the Wav2Vec2 architecture. It uses the pre-trained XLS-R 300M model, which is designed to learn speech representations from raw audio in a variety of languages.

Wav2Vec2 models are capable of learning strong acoustic representations and have shown state-of-the-art performance for ASR tasks across multiple languages, including low-resource languages like Telugu.

Intended uses & limitations

Intended uses: Automatic Speech Recognition (ASR) for Telugu language. Can be used in voice assistants, transcription services, or other applications that require Telugu speech-to-text conversion.

Limitations: The model is trained on specific datasets (OpenSLR SLR66) and may not generalize perfectly to all varieties of Telugu. Performance may degrade for noisy audio or different dialects/accents that were not well-represented in the training data. The model outputs unpunctuated, lower-cased transcriptions.

Training and evaluation data

This model was fine-tuned on the OpenSLR SLR66 dataset, which contains hours of spoken Telugu. The dataset is used for speech recognition tasks in Telugu and is one of the standard datasets used for ASR in low-resource languages.

Dataset Name: OpenSLR Dataset Config: SLR66 Dataset Split: Train Task Type: Automatic Speech Recognition Language: Telugu (te)

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0003
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • num_epochs: 30
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer
6.0648 3.5874 400 0.9576 0.8553
0.5387 7.1749 800 0.3074 0.5044
0.2438 10.7623 1200 0.2602 0.3833
0.1551 14.3498 1600 0.2547 0.3872
0.116 17.9372 2000 0.2623 0.3844
0.0864 21.5247 2400 0.2482 0.3617
0.0673 25.1121 2800 0.2471 0.3459
0.0535 28.6996 3200 0.2464 0.3412

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.4.1+cu121
  • Datasets 2.21.0
  • Tokenizers 0.19.1
Downloads last month
12
Safetensors
Model size
316M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for kaarthu2003/wav2vec2-large-xls-r-300m-telugu

Finetuned
(374)
this model

Dataset used to train kaarthu2003/wav2vec2-large-xls-r-300m-telugu

Evaluation results