smcproject
/

Malwhisper-v1-small

Automatic Speech Recognition

Inference Endpoints

Model card Files Files and versions Community

Malwhisper-v1-small / README.md

kurianbenoy's picture

Update README.md

8e5a350 verified 7 months ago

|

832 Bytes

	---
	license: mit
	datasets:
	- thennal/IMaSC
	language:
	- ml
	- en
	library_name: transformers
	---

	## Malwhisper-v1-small

	This model is a fine-tuned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small) fine-tuned on [IMASc dataset](https://www.kaggle.com/datasets/thennal/imasc).

	## About Dataset

	IMaSC is a Malayalam text and speech corpus made available by ICFOSS for the purpose of developing speech technology for Malayalam, particularly text-to-speech. The corpus contains 34,473 text-audio pairs of Malayalam sentences spoken by 8 speakers, totalling in approximately 50 hours of audio.

	## Training

	- GPUs used: T4 - 16 GB

	- Training Time: 14 hours

	## Evaluation

	The fine-tuned model on evaluating in the following dataset:


	In SMC Malayalam Speech Corpus dataset:

	WER - 73.56

	CER - 17.82