RuudVelo
/

wav2vec2-large-xlsr-53-frisian

Automatic Speech Recognition Transformers PyTorch JAX wav2vec2 audio speech xlsr-fine-tuning-week Eval Results Inference Endpoints

Model card Files Files and versions Community

RuudVelo commited on Mar 27, 2021

Commit

b11dd2f

•

2 Parent(s): 2f120f2 d7249dc

Merge branch 'main' of https://huggingface.co/RuudVelo/wav2vec2-large-xlsr-53-frisian into main

Browse files

Files changed (1) hide show

README.md +76 -0

README.md ADDED Viewed

	@@ -0,0 +1,76 @@

+---
+language: fy-NL
+tags:
+- audio
+- automatic-speech-recognition
+- speech
+- xlsr-fine-tuning-week
+license: apache-2.0
+model-index:
+- name: wav2vec2-large-xlsr-53-frisian by RuudVelo
+  results:
+  - task:
+      name: Speech Recognition
+      type: automatic-speech-recognition
+    dataset:
+      name: Common Voice fy-NL
+      type: common_voice
+      args: fy-NL
+    metrics:
+       - name: Test WER
+         type: wer
+         value: 20.35
+---
+## Evaluation on Common Voice Frisian Test
+```python
+import torchaudio
+from datasets import load_dataset, load_metric
+from transformers import (
+    Wav2Vec2ForCTC,
+    Wav2Vec2Processor,
+)
+import torch
+import re
+import sys
+model_name = "RuudVelo/wav2vec2-large-xlsr-53-frisian"
+device = "cuda"
+chars_to_ignore_regex = '[\\\\,\\\\?\\\\.\\\\!\\\\-\\\\;\\\\:\\\\"\\\\â€œ\\\\%\\\\â€˜\\\\â€\\\\ï¿½]'
+model = Wav2Vec2ForCTC.from_pretrained(model_name).to(device)
+processor = Wav2Vec2Processor.from_pretrained(model_name)
+ds = load_dataset("common_voice", "fy-NL", split="test", data_dir="./cv-corpus-6.1-2020-12-11")
+resampler = torchaudio.transforms.Resample(orig_freq=48_000, new_freq=16_000)
+def map_to_array(batch):
+    speech, _ = torchaudio.load(batch["path"])
+    batch["speech"] = resampler.forward(speech.squeeze(0)).numpy()
+    batch["sampling_rate"] = resampler.new_freq
+    batch["sentence"] = re.sub(chars_to_ignore_regex, '', batch["sentence"]).lower() + " "
+    return batch
+ds = ds.map(map_to_array)
+def map_to_pred(batch):
+    features = processor(batch["speech"], sampling_rate=batch["sampling_rate"][0], padding=True, return_tensors="pt")
+    input_values = features.input_values.to(device)
+    attention_mask = features.attention_mask.to(device)
+    with torch.no_grad():
+        logits = model(input_values, attention_mask=attention_mask).logits
+    pred_ids = torch.argmax(logits, dim=-1)
+    batch["predicted"] = processor.batch_decode(pred_ids)
+    batch["target"] = batch["sentence"]
+    return batch
+result = ds.map(map_to_pred, batched=True, batch_size=16, remove_columns=list(ds.features.keys()))
+wer = load_metric("wer")
+print(wer.compute(predictions=result["predicted"], references=result["target"]))
+```
+**Result**: 20.35 %