File size: 2,424 Bytes
996832f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e7617d1
996832f
 
 
 
 
edb5c3d
996832f
edb5c3d
 
 
 
996832f
 
 
e7617d1
edb5c3d
 
 
 
 
 
 
 
 
 
 
 
 
 
996832f
 
301ef17
996832f
301ef17
 
996832f
 
 
301ef17
996832f
 
 
e7617d1
996832f
 
 
e7617d1
996832f
e7617d1
996832f
e7617d1
 
996832f
 
e7617d1
 
 
 
996832f
e7617d1
996832f
e7617d1
996832f
 
 
e7617d1
996832f
e7617d1
996832f
 
 
 
 
 
e7617d1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
---
language: en
datasets:
- librispeech_asr
tags:
- audio
- automatic-speech-recognition
- hf-asr-leaderboard
license: apache-2.0
widget:
- example_title: Librispeech sample 1
  src: https://cdn-media.huggingface.co/speech_samples/sample1.flac
- example_title: Librispeech sample 2
  src: https://cdn-media.huggingface.co/speech_samples/sample2.flac
model-index:
- name: patrickvonplaten/wav2vec2-base-960h-4-gram
  results:
  - task:
      name: Automatic Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: LibriSpeech (clean)
      type: librispeech_asr
      config: clean
      split: test
      args: 
        language: en
    metrics:
    - name: Test WER
      type: wer
      value: 2.59
  - task:
      name: Automatic Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: LibriSpeech (other)
      type: librispeech_asr
      config: other
      split: test
      args: 
        language: en
    metrics:
    - name: Test WER
      type: wer
      value: 6.46
---

# Wav2Vec2-Base-960h + 4-gram

This model is identical to [Facebook's Wav2Vec2-Base-960h](https://huggingface.co/facebook/wav2vec2-base-960h), but is 
augmented with an English 4-gram. The `4-gram.arpa.gz` of [Librispeech's official ngrams](https://www.openslr.org/11) is used.
 
 ## Evaluation
 
 This code snippet shows how to evaluate **patrickvonplaten/wav2vec2-base-960h-4-gram** on LibriSpeech's "clean" and "other" test data.
 
```python
from datasets import load_dataset
from transformers import AutoModelForCTC, AutoProcessor
import torch
from jiwer import wer

model_id = "patrickvonplaten/wav2vec2-base-960h-4-gram"

librispeech_eval = load_dataset("librispeech_asr", "other", split="test")

model = AutoModelForCTC.from_pretrained(model_id).to("cuda")
processor = AutoProcessor.from_pretrained(model_id)

def map_to_pred(batch):
    inputs = processor(batch["audio"]["array"], sampling_rate=16_000, return_tensors="pt")

    inputs = {k: v.to("cuda") for k,v in inputs.items()}

    with torch.no_grad():
        logits = model(**inputs).logits

    transcription = processor.batch_decode(logits.cpu().numpy()).text[0]
    batch["transcription"] = transcription
    return batch

result = librispeech_eval.map(map_to_pred, remove_columns=["audio"])

print(wer(result["text"], result["transcription"]))
```

*Result (WER)*:

| "clean" | "other" |
|---|---|
| 2.59 | 6.46 |