Malwhisper-v1-small

This model is a fine-tuned version of openai/whisper-small fine-tuned on IMASc dataset.

About Dataset

IMaSC is a Malayalam text and speech corpus made available by ICFOSS for the purpose of developing speech technology for Malayalam, particularly text-to-speech. The corpus contains 34,473 text-audio pairs of Malayalam sentences spoken by 8 speakers, totalling in approximately 50 hours of audio.

Training

  • GPUs used: T4 - 16 GB

  • Training Time: 14 hours

Evaluation

The fine-tuned model on evaluating in the following dataset:

In SMC Malayalam Speech Corpus dataset:

WER - 73.56

CER - 17.82

Downloads last month
16
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train smcproject/Malwhisper-v1-small

Spaces using smcproject/Malwhisper-v1-small 2