kingabzpro
/

wav2vec2-large-xls-r-300m-hi

@@ -5,11 +5,45 @@ tags:
 - generated_from_trainer
 metrics:
 - wer
 model-index:
 - name: wav2vec2-large-xls-r-300m-hi
-  results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
@@ -21,20 +55,61 @@ It achieves the following results on the evaluation set:
 - Wer: 0.2992
 - Cer: 0.0786
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
 ### Training hyperparameters
 The following hyperparameters were used during training:
@@ -65,4 +140,4 @@ The following hyperparameters were used during training:
 - Transformers 4.33.0
 - Pytorch 2.0.0
 - Datasets 2.1.0
-- Tokenizers 0.13.3

 - generated_from_trainer
 metrics:
 - wer
+- cer
 model-index:
 - name: wav2vec2-large-xls-r-300m-hi
+  results:
+  - task:
+      name: Automatic Speech Recognition
+      type: automatic-speech-recognition
+    dataset:
+      name: Common Voice 15
+      type: mozilla-foundation/common_voice_15_0
+      args: hi
+    metrics:
+      - name: Test WER
+        type: wer
+        value: 0.2934
+      - name: Test CER
+        type: cer
+        value: 0.0786
+  - task:
+      name: Automatic Speech Recognition
+      type: automatic-speech-recognition
+    dataset:
+      name: Common Voice 8
+      type: mozilla-foundation/common_voice_8_0
+      args: hi
+    metrics:
+      - name: Test WER
+        type: wer
+        value: 0.5209
+      - name: Test CER
+        type: cer
+        value: 0.1790
+datasets:
+- mozilla-foundation/common_voice_15_0
+language:
+- hi
+library_name: transformers
+pipeline_tag: automatic-speech-recognition
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
 - Wer: 0.2992
 - Cer: 0.0786
+## Evaluation
+```python
+import torch
+from datasets import load_dataset, load_metric
+from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
+import librosa
+import unicodedata
+import re
+test_dataset = load_dataset("mozilla-foundation/common_voice_8_0", "hi", split="test")
+wer = load_metric("wer")
+cer = load_metric("cer")
+processor = Wav2Vec2Processor.from_pretrained("kingabzpro/wav2vec2-large-xls-r-300m-hi")
+model = Wav2Vec2ForCTC.from_pretrained("kingabzpro/wav2vec2-large-xls-r-300m-hi")
+model.to("cuda")
+# Preprocessing the datasets.
+def speech_file_to_array_fn(batch):
+    chars_to_ignore_regex = '[\,\?\.\!\-\;\:\"\“\%\‘\”\�\’\'\|\&\–]'
+    remove_en = '[A-Za-z]'
+    batch["sentence"] = re.sub(chars_to_ignore_regex, "", batch["sentence"].lower())
+    batch["sentence"] = re.sub(remove_en, "", batch["sentence"]).lower()
+    batch["sentence"] = unicodedata.normalize("NFKC", batch["sentence"])
+    speech_array, sampling_rate = librosa.load(batch["path"], sr=16_000)
+    batch["speech"] = speech_array
+    return batch
+test_dataset = test_dataset.map(speech_file_to_array_fn)
+# Preprocessing the datasets.
+# We need to read the aduio files as arrays
+def evaluate(batch):
+  inputs = processor(batch["speech"], sampling_rate=16_000, return_tensors="pt", padding=True)
+  with torch.no_grad():
+      logits = model(inputs.input_values.to("cuda")).logits
+      pred_ids = torch.argmax(logits, dim=-1)
+      batch["pred_strings"] = processor.batch_decode(pred_ids, skip_special_tokens=True)
+      return batch
+result = test_dataset.map(evaluate, batched=True, batch_size=8)
+print("WER: {}".format(wer.compute(predictions=result["pred_strings"], references=result["sentence"])))
+print("CER: {}".format(cer.compute(predictions=result["pred_strings"], references=result["sentence"])))
+```
+**WER: 0.5209850206372026**
+**CER: 0.17902923538230883**
 ### Training hyperparameters
 The following hyperparameters were used during training:
 - Transformers 4.33.0
 - Pytorch 2.0.0
 - Datasets 2.1.0
+- Tokenizers 0.13.3