File size: 2,117 Bytes
8bc572b f4ac233 8bc572b f4ac233 8bc572b f4ac233 8bc572b f4ac233 8bc572b f4ac233 8bc572b f4ac233 8bc572b f4ac233 8bc572b f4ac233 8bc572b f4ac233 8bc572b f4ac233 8bc572b f4ac233 8bc572b f4ac233 8bc572b f4ac233 8bc572b f4ac233 8bc572b f4ac233 8bc572b f4ac233 8bc572b f4ac233 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
---
language: en
datasets:
- librispeech_asr
tags:
- audio
- automatic-speech-recognition
- hf-asr-leaderboard
license: apache-2.0
widget:
- example_title: Librispeech sample 1
src: https://cdn-media.huggingface.co/speech_samples/sample1.flac
- example_title: Librispeech sample 2
src: https://cdn-media.huggingface.co/speech_samples/sample2.flac
model-index:
- name: patrickvonplaten/wav2vec2-large-960h-lv60-self-4-gram
results:
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: Librispeech (clean)
type: librispeech_asr
args: en
metrics:
- name: Test WER
type: wer
value: 2.59
---
# Wav2Vec2-Base-960h + 4-gram
This model is identical to [Facebook's Wav2Vec2-Large-960h-lv60-self](https://huggingface.co/facebook/wav2vec2-large-960h-lv60-self), but is
augmented with an English 4-gram. The `4-gram.arpa.gz` of [Librispeech's official ngrams](https://www.openslr.org/11) is used.
## Evaluation
This code snippet shows how to evaluate **patrickvonplaten/wav2vec2-large-960h-lv60-self-4-gram** on LibriSpeech's "clean" and "other" test data.
```python
from datasets import load_dataset
from transformers import AutoModelForCTC, AutoProcessor
import torch
from jiwer import wer
model_id = "patrickvonplaten/wav2vec2-large-960h-lv60-self-4-gram"
librispeech_eval = load_dataset("librispeech_asr", "other", split="test")
model = AutoModelForCTC.from_pretrained(model_id).to("cuda")
processor = AutoProcessor.from_pretrained(model_id)
def map_to_pred(batch):
inputs = processor(batch["audio"]["array"], sampling_rate=16_000, return_tensors="pt")
inputs = {k: v.to("cuda") for k,v in inputs.items()}
with torch.no_grad():
logits = model(**inputs).logits
transcription = processor.batch_decode(logits.cpu().numpy()).text[0]
batch["transcription"] = transcription
return batch
result = librispeech_eval.map(map_to_pred, remove_columns=["audio"])
print(wer(result["text"], result["transcription"]))
```
*Result (WER)*:
| "clean" | "other" |
|---|---|
| 1.84 | 3.71 | |