--- license: apache-2.0 base_model: openai/whisper-small tags: - generated_from_trainer datasets: - mozilla-foundation/common_voice_15_0 - mozilla-foundation/common_voice_13_0 language: - hi metrics: - cer - wer library_name: transformers pipeline_tag: automatic-speech-recognition model-index: - name: whisper-small-hi-cv results: - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice 15 type: mozilla-foundation/common_voice_15_0 args: hi metrics: - name: Test WER type: wer value: 13.9913 - name: Test CER type: cer value: 5.8844 - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice 13 type: mozilla-foundation/common_voice_13_0 args: hi metrics: - name: Test WER type: wer value: 23.1361 - name: Test CER type: cer value: 10.4366 --- # whisper-small-hi-cv This model is a fine-tuned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small) on the Common Voice 15 dataset. It achieves the following results on the evaluation set: - Wer: 14.0178 - Cer: 05.8824 ## Evaluation ```python from datasets import load_dataset,load_metric,Audio from transformers import WhisperForConditionalGeneration, WhisperProcessor import torch import torchaudio test_dataset = load_dataset("mozilla-foundation/common_voice_13_0", "hi", split="test") wer = load_metric("wer") cer = load_metric("cer") processor = WhisperProcessor.from_pretrained("SakshiRathi77/whisper-hindi-kagglex") model = WhisperForConditionalGeneration.from_pretrained("SakshiRathi77/whisper-hindi-kagglex").to("cuda") test_dataset = test_dataset.cast_column("audio", Audio(sampling_rate=16000)) def map_to_pred(batch): audio = batch["audio"] input_features = processor(audio["array"], sampling_rate=audio["sampling_rate"], return_tensors="pt").input_features batch["reference"] = processor.tokenizer._normalize(batch['sentence']) with torch.no_grad(): predicted_ids = model.generate(input_features.to("cuda"))[0] transcription = processor.decode(predicted_ids) batch["prediction"] = processor.tokenizer._normalize(transcription) return batch result = test_dataset.map(map_to_pred) print("WER: {:2f}".format(100 * wer.compute(predictions=result["prediction"], references=result["reference"]))) print("CER: {:2f}".format(100 * cer.compute(predictions=result["prediction"], references=result["reference"]))) ``` ```bash WER: 23.1361 CER: 10.4366 ```