|
--- |
|
license: mit |
|
datasets: |
|
- thennal/IMaSC |
|
language: |
|
- ml |
|
model-index: |
|
- name: Malwhisper-v1-medium |
|
results: |
|
- task: |
|
type: automatic-speech-recognition |
|
name: Automatic Speech Recognition |
|
dataset: |
|
name: Common Voice 11.0 |
|
type: mozilla-foundation/common_voice_11_0 |
|
config: ml |
|
split: test |
|
args: ml |
|
metrics: |
|
- type: wer |
|
value: 61.84 |
|
name: WER |
|
- type: cer |
|
value: 15.41 |
|
name: CER |
|
library_name: transformers |
|
--- |
|
|
|
# Malwhisper-v1-medium |
|
|
|
This model is a fine-tuned version of [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) fine-tuned on [IMASc dataset](https://www.kaggle.com/datasets/thennal/imasc). |
|
|
|
## About Dataset |
|
|
|
IMaSC is a Malayalam text and speech corpus made available by ICFOSS for the purpose of developing speech technology for Malayalam, particularly text-to-speech. The corpus contains 34,473 text-audio pairs of Malayalam sentences spoken by 8 speakers, totalling in approximately 50 hours of audio. |
|
|
|
## Training |
|
|
|
[Script Used for training](https://github.com/kurianbenoy/Keyword_generator_project/blob/main/Whisper_IMASC_final_e2eofficerun.ipynb) |
|
|
|
[Training run](https://wandb.ai/hello34/wandb_whisper_e2e/runs/q2xlvbw5) |
|
|
|
[Experiment Tracking with Weights and Biases](https://wandb.ai/hello34/wandb_whisper_e2e) |
|
|
|
- GPUs used: A100 - 80 GB |
|
|
|
- Training Time: 16 hours |
|
|
|
- This project was build with A100 80GB GPU provided by [E2E Cloud during their open hack day](https://www.eventbrite.com/e/open-hack-day-tickets-783582435157) |
|
|
|
## Evaluation |
|
The fine-tuned model on evaluating in the following dataset: |
|
|
|
**In Mozilla CommonVoice 11.0 dataset (Malayalam subset):** |
|
|
|
WER - 61.84 |
|
|
|
CER - 15.41 |
|
|
|
**In SMC Malayalam Speech Corpus dataset:** |
|
|
|
WER - 70.49 |
|
|
|
CER - 17.0 |