ASR_yoruba_Azure
This model achieves the following results on the evaluation set:
- Loss: 0.2229
- Wer: 0.2481
Model description
This model, based on the wav2vec2 architecture, has 965 million parameters and was trained on 36 hours of Yoruba audio data from multiple speakers. It currently achieves a Word Error Rate (WER) of twenty-four percent.
Intended uses & limitations
-The model is designed for Automatic Speech Recognition (ASR) in the Yoruba language. -It can be utilized for transcribing spoken Yoruba into text, supporting applications like voice-activated systems, automated transcription services, and linguistic research. -The model's current Word Error Rate (WER) of twenty-four percent indicates room for improvement in transcription accuracy. Performance may be affected by background noise, accents, and variations in speaker pronunciation. -It is optimized for short audio clips (up to five minutes) due to GPU memory constraints
Training and evaluation data
-The model was trained on 36 hours of Yoruba audio data, encompassing various speakers to capture diverse accents and speech patterns. The data includes conversational speech, read speech, and different audio qualities to enhance robustness.
-The evaluation data set used to measure the model's performance included a representative sample of Yoruba speech not seen during training. The WER of twenty-four percent reflects the model's accuracy on this evaluation data, which includes various speech scenarios.
Training procedure
The model was trained using the wav2vec2 architecture, which involves pre-training on large-scale unlabeled data followed by fine-tuning on specific Yoruba audio data. The training process included optimizing model parameters to minimize transcription errors, employing techniques such as data augmentation and regularization to improve performance. Training was performed on high-performance GPUs to handle the large-scale data and model parameters, with iterative evaluations to monitor progress and adjust training strategies.
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.001
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 400
- num_epochs: 64
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Wer |
---|---|---|---|---|
0.2635 | 1.5238 | 400 | 0.2428 | 0.2582 |
0.2494 | 3.0476 | 800 | 0.2321 | 0.2571 |
0.2438 | 4.5714 | 1200 | 0.2315 | 0.2517 |
0.2417 | 6.0952 | 1600 | 0.2282 | 0.2591 |
0.2349 | 7.6190 | 2000 | 0.2299 | 0.2529 |
0.237 | 9.1429 | 2400 | 0.2301 | 0.2545 |
0.2355 | 10.6667 | 2800 | 0.2262 | 0.2559 |
0.2321 | 12.1905 | 3200 | 0.2290 | 0.2527 |
0.235 | 13.7143 | 3600 | 0.2265 | 0.2546 |
0.2289 | 15.2381 | 4000 | 0.2260 | 0.2551 |
0.2305 | 16.7619 | 4400 | 0.2267 | 0.2519 |
0.2314 | 18.2857 | 4800 | 0.2308 | 0.2583 |
0.2283 | 19.8095 | 5200 | 0.2243 | 0.2486 |
0.2288 | 21.3333 | 5600 | 0.2288 | 0.2563 |
0.2303 | 22.8571 | 6000 | 0.2244 | 0.2466 |
0.2275 | 24.3810 | 6400 | 0.2266 | 0.2471 |
0.2261 | 25.9048 | 6800 | 0.2264 | 0.2509 |
0.2271 | 27.4286 | 7200 | 0.2244 | 0.2494 |
0.2321 | 28.9524 | 7600 | 0.2257 | 0.2477 |
0.2261 | 30.4762 | 8000 | 0.2243 | 0.2533 |
0.2247 | 32.0 | 8400 | 0.2255 | 0.2449 |
0.2229 | 33.5238 | 8800 | 0.2268 | 0.2471 |
0.2242 | 35.0476 | 9200 | 0.2233 | 0.2459 |
0.2299 | 36.5714 | 9600 | 0.2268 | 0.2527 |
0.2272 | 38.0952 | 10000 | 0.2248 | 0.2471 |
0.2242 | 39.6190 | 10400 | 0.2249 | 0.2462 |
0.2249 | 41.1429 | 10800 | 0.2245 | 0.2469 |
0.2244 | 42.6667 | 11200 | 0.2249 | 0.2534 |
0.2264 | 44.1905 | 11600 | 0.2247 | 0.2457 |
0.2252 | 45.7143 | 12000 | 0.2237 | 0.2464 |
0.2239 | 47.2381 | 12400 | 0.2240 | 0.2495 |
0.2268 | 48.7619 | 12800 | 0.2240 | 0.2494 |
0.2264 | 50.2857 | 13200 | 0.2243 | 0.2528 |
0.2244 | 51.8095 | 13600 | 0.2238 | 0.2495 |
0.2236 | 53.3333 | 14000 | 0.2226 | 0.2475 |
0.2266 | 54.8571 | 14400 | 0.2230 | 0.2470 |
0.225 | 56.3810 | 14800 | 0.2232 | 0.2453 |
0.2233 | 57.9048 | 15200 | 0.2227 | 0.2467 |
0.223 | 59.4286 | 15600 | 0.2226 | 0.2496 |
0.224 | 60.9524 | 16000 | 0.2226 | 0.2472 |
0.2225 | 62.4762 | 16400 | 0.2229 | 0.2481 |
Framework versions
- Transformers 4.44.0.dev0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 0