wav2vec2-large-xls-r-300m-telugu
This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m on the openslr dataset. It achieves the following results on the evaluation set:
- Loss: 0.2464
- Wer: 0.3412
Model description
wav2vec2-large-xls-r-300m-telugu is a speech recognition model fine-tuned on the Telugu language using the Wav2Vec2 architecture. It uses the pre-trained XLS-R 300M model, which is designed to learn speech representations from raw audio in a variety of languages.
Wav2Vec2 models are capable of learning strong acoustic representations and have shown state-of-the-art performance for ASR tasks across multiple languages, including low-resource languages like Telugu.
Intended uses & limitations
Intended uses: Automatic Speech Recognition (ASR) for Telugu language. Can be used in voice assistants, transcription services, or other applications that require Telugu speech-to-text conversion.
Limitations: The model is trained on specific datasets (OpenSLR SLR66) and may not generalize perfectly to all varieties of Telugu. Performance may degrade for noisy audio or different dialects/accents that were not well-represented in the training data. The model outputs unpunctuated, lower-cased transcriptions.
Training and evaluation data
This model was fine-tuned on the OpenSLR SLR66 dataset, which contains hours of spoken Telugu. The dataset is used for speech recognition tasks in Telugu and is one of the standard datasets used for ASR in low-resource languages.
Dataset Name: OpenSLR Dataset Config: SLR66 Dataset Split: Train Task Type: Automatic Speech Recognition Language: Telugu (te)
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0003
- train_batch_size: 16
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- num_epochs: 30
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Wer |
---|---|---|---|---|
6.0648 | 3.5874 | 400 | 0.9576 | 0.8553 |
0.5387 | 7.1749 | 800 | 0.3074 | 0.5044 |
0.2438 | 10.7623 | 1200 | 0.2602 | 0.3833 |
0.1551 | 14.3498 | 1600 | 0.2547 | 0.3872 |
0.116 | 17.9372 | 2000 | 0.2623 | 0.3844 |
0.0864 | 21.5247 | 2400 | 0.2482 | 0.3617 |
0.0673 | 25.1121 | 2800 | 0.2471 | 0.3459 |
0.0535 | 28.6996 | 3200 | 0.2464 | 0.3412 |
Framework versions
- Transformers 4.44.2
- Pytorch 2.4.1+cu121
- Datasets 2.21.0
- Tokenizers 0.19.1
- Downloads last month
- 12
Model tree for kaarthu2003/wav2vec2-large-xls-r-300m-telugu
Base model
facebook/wav2vec2-xls-r-300m