kingabzpro
/

whisper-small-hi-cv

Automatic Speech Recognition

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Community

kingabzpro commited on Oct 14, 2023

Commit

01bee33

•

1 Parent(s): ae30068

Create README.md

Files changed (1) hide show

README.md +93 -0

README.md ADDED Viewed

	@@ -0,0 +1,93 @@

+---
+license: apache-2.0
+base_model: openai/whisper-small
+tags:
+  - generated_from_trainer
+datasets:
+- mozilla-foundation/common_voice_13_0
+language:
+- hi
+metrics:
+- cer
+- wer
+library_name: transformers
+pipeline_tag: automatic-speech-recognition
+model-index:
+  - name: whisper-small-hi-cv
+    results:
+      - task:
+          name: Automatic Speech Recognition
+          type: automatic-speech-recognition
+        dataset:
+          name: Common Voice 15
+          type: mozilla-foundation/common_voice_15_0
+          args: hi
+        metrics:
+          - name: Test WER
+            type: wer
+            value: 13.9913
+          - name: Test CER
+            type: cer
+            value: 5.8844
+      - task:
+          name: Automatic Speech Recognition
+          type: automatic-speech-recognition
+        dataset:
+          name: Common Voice 13
+          type: mozilla-foundation/common_voice_13_0
+          args: hi
+        metrics:
+          - name: Test WER
+            type: wer
+            value: 23.3824
+          - name: Test CER
+            type: cer
+            value: 10.5288
+---
+# whisper-small-hi-cv
+This model is a fine-tuned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small) on the Common Voice 15 dataset.
+It achieves the following results on the evaluation set:
+- Wer: 13.9913
+- Cer: 5.8844
+View the results on Kaggle Notebook: https://www.kaggle.com/code/kingabzpro/whisper-hindi-eval
+## Evaluation
+```python
+from datasets import load_dataset,load_metric,Audio
+from transformers import WhisperForConditionalGeneration, WhisperProcessor
+import torch
+import torchaudio
+test_dataset = load_dataset("mozilla-foundation/common_voice_13_0", "hi", split="test")
+wer = load_metric("wer")
+cer = load_metric("cer")
+processor = WhisperProcessor.from_pretrained("kingabzpro/whisper-small-hi-cv")
+model = WhisperForConditionalGeneration.from_pretrained("kingabzpro/whisper-small-hi-cv").to("cuda")
+test_dataset = test_dataset.cast_column("audio", Audio(sampling_rate=16000))
+def map_to_pred(batch):
+    audio = batch["audio"]
+    input_features = processor(audio["array"], sampling_rate=audio["sampling_rate"], return_tensors="pt").input_features
+    batch["reference"] = processor.tokenizer._normalize(batch['sentence'])
+    with torch.no_grad():
+        predicted_ids = model.generate(input_features.to("cuda"))[0]
+    transcription = processor.decode(predicted_ids)
+    batch["prediction"] = processor.tokenizer._normalize(transcription)
+    return batch
+result = test_dataset.map(map_to_pred)
+print("WER: {:2f}".format(100 * wer.compute(predictions=result["prediction"], references=result["reference"])))
+print("CER: {:2f}".format(100 * cer.compute(predictions=result["prediction"], references=result["reference"])))
+```
+```bash
+WER: 23.3824
+CER: 10.5288
+```