Enhanced Multilingual Code-Switched Speech Recognition for Low-Resource Languages Using Transformer-Based Models and Dynamic Switching Algorithms
Model description
This model is designed to handle code-switched speech in Hindi and Marathi using the wav2vec2-large-xls-r-300m transformer-based model. It leverages advanced techniques such as Q-Learning, SARSA, and Deep Q-Networks (DQN) for determining optimal switch points in code-switched speech.
Intended uses & limitations
Intended uses
- Automatic speech recognition for multilingual environments involving Hindi and Marathi.
- Research in multilingual ASR and code-switching phenomena.
Limitations
- The model may exhibit biases inherent in the training data.
- Potential limitations in accurately recognizing heavily accented or dialectal speech not covered in the training dataset.
Training params and experimental info
The model was fine-tuned using the following parameters:
- Attention Dropout: 0.1
- Hidden Dropout: 0.1
- Feature Projection Dropout: 0.1
- Layerdrop: 0.1
- Learning Rate: 3e-4
- Mask Time Probability: 0.05
Training dataset
The model was trained on the Common Voice dataset, which includes diverse speech samples in both Hindi and Marathi. The dataset was augmented with synthetically generated code-switched speech to improve the model's robustness in handling code-switching scenarios.
Evaluation results
The model achieved the following performance metrics on the test set:
- Word Error Rate (WER): 0.2800
- Character Error Rate (CER): 0.2400
- Downloads last month
- 5
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Dataset used to train Hemantrao/wav2vec2-large-xls-r-300m-hindi_marathi-code-switching-experimentx1
Evaluation results
- Word Error Rate (WER) on common_voiceInternal Evaluation0.280
- Character Error Rate (CER) on common_voiceInternal Evaluation0.240