Iskaj
/

xlsr300m_cv_8.0_nl

Automatic Speech Recognition

mozilla-foundation/common_voice_8_0

mozilla-foundation/common_voice_7_0

robust-speech-event

hf-asr-leaderboard

Inference Endpoints

Model card Files Files and versions Community

Iskaj commited on Jan 28, 2022

Commit

c0844d5

•

1 Parent(s): 2069130

Update README.md

Files changed (1) hide show

README.md +89 -1

README.md CHANGED Viewed

	@@ -1 +1,89 @@
1	- ~~Actually trained on CV 7.0, whoops...~~

+---
+language:
+- sv-SE
+license: apache-2.0
+tags:
+- automatic-speech-recognition
+- mozilla-foundation/common_voice_7_0
+- generated_from_trainer
+- nl
+- robust-speech-event
+- model_for_talk
+datasets:
+- mozilla-foundation/common_voice_7_0
+model-index:
+- name: XLS-R-300M - Dutch
+  results:
+  - task:
+      name: Automatic Speech Recognition
+      type: automatic-speech-recognition
+    dataset:
+      name: Common Voice 7
+      type: mozilla-foundation/common_voice_7_0
+      args: nl
+    metrics:
+       - name: Test WER
+         type: wer
+         value: ???
+       - name: Test CER
+         type: cer
+         value: ???
+  - task:
+      name: Automatic Speech Recognition
+      type: automatic-speech-recognition
+    dataset:
+      name: Robust Speech Event - Dev Data
+      type: speech-recognition-community-v2/dev_data
+      args: sv
+    metrics:
+       - name: Test WER
+         type: wer
+         value: ???
+       - name: Test CER
+         type: cer
+         value: ???
+---
+# xlsr300m_cv_8.0_nl
+#### Evaluation Commands
+1. To evaluate on `mozilla-foundation/common_voice_7_0` with split `test`
+```bash
+python eval.py --model_id Iskaj/xlsr300m_cv_8.0_nl --dataset mozilla-foundation/common_voice_8_0 --config nl --split test
+```
+2. To evaluate on `speech-recognition-community-v2/dev_data`
+```bash
+python eval.py --model_id Iskaj/xlsr300m_cv_8.0_nl --dataset speech-recognition-community-v2/dev_data --config nl --split validation --chunk_length_s 5.0 --stride_length_s 1.0
+```
+### Inference
+```python
+import torch
+from datasets import load_dataset
+from transformers import AutoModelForCTC, AutoProcessor
+import torchaudio.functional as F
+model_id = "Iskaj/xlsr300m_cv_8.0_nl"
+sample_iter = iter(load_dataset("mozilla-foundation/common_voice_8_0", "nl", split="test", streaming=True, use_auth_token=True))
+sample = next(sample_iter)
+resampled_audio = F.resample(torch.tensor(sample["audio"]["array"]), 48_000, 16_000).numpy()
+model = AutoModelForCTC.from_pretrained(model_id)
+processor = AutoProcessor.from_pretrained(model_id)
+inputs = processor(resampled_audio, sampling_rate=16_000, return_tensors="pt")
+with torch.no_grad():
+  logits = model(**inputs).logits
+  predicted_ids = torch.argmax(logits, dim=-1)
+  transcription = processor.batch_decode(predicted_ids)
+transcription[0].lower()
+#'het kontine schip lag aangemeert in de aven'
+```