andreagasparini commited on
Commit
d02ba4f
1 Parent(s): 54074b1

Fixes evaluation instructions and updates WER scores

Browse files

Hi, I was trying to evaluate the model on LibriSpeech's "clean" and "other" test data following the code snippet in the Model card but I got a `TypeError` due to the transcriptions stored in the batch as wrapped in lists instead of as plain strings (e.g. ["transcription example"] instead of "transcription example") in the `map_to_pred` function.

``TypeError: expected string or bytes-like object``

After fixing the error I recomputed the WER and updated the scores without aproximating them. I think the same should be done for other wav2vec2 based models (e.g. facebook/wav2vec2-large-960h-lv60).

Files changed (1) hide show
  1. README.md +7 -7
README.md CHANGED
@@ -24,7 +24,7 @@ model-index:
24
  metrics:
25
  - name: Test WER
26
  type: wer
27
- value: 1.9
28
  - task:
29
  name: Automatic Speech Recognition
30
  type: automatic-speech-recognition
@@ -38,7 +38,7 @@ model-index:
38
  metrics:
39
  - name: Test WER
40
  type: wer
41
- value: 3.9
42
  ---
43
 
44
  # Wav2Vec2-Large-960h-Lv60 + Self-Training
@@ -85,9 +85,9 @@ To transcribe audio files the model can be used as a standalone acoustic model a
85
  transcription = processor.batch_decode(predicted_ids)
86
  ```
87
 
88
- ## Evaluation
89
 
90
- This code snippet shows how to evaluate **facebook/wav2vec2-large-960h-lv60-self** on LibriSpeech's "clean" and "other" test data.
91
 
92
  ```python
93
  from datasets import load_dataset
@@ -103,14 +103,14 @@ processor = Wav2Vec2Processor.from_pretrained("facebook/wav2vec2-large-960h-lv60
103
 
104
  def map_to_pred(batch):
105
  inputs = processor(batch["audio"]["array"], return_tensors="pt", padding="longest")
106
- input_values = inputs.input_values.to("cuda")
107
  attention_mask = inputs.attention_mask.to("cuda")
108
 
109
  with torch.no_grad():
110
  logits = model(input_values, attention_mask=attention_mask).logits
111
 
112
  predicted_ids = torch.argmax(logits, dim=-1)
113
- transcription = processor.batch_decode(predicted_ids)
114
  batch["transcription"] = transcription
115
  return batch
116
 
@@ -123,4 +123,4 @@ print("WER:", wer(result["text"], result["transcription"]))
123
 
124
  | "clean" | "other" |
125
  |---|---|
126
- | 1.9 | 3.9 |
 
24
  metrics:
25
  - name: Test WER
26
  type: wer
27
+ value: 1.86
28
  - task:
29
  name: Automatic Speech Recognition
30
  type: automatic-speech-recognition
 
38
  metrics:
39
  - name: Test WER
40
  type: wer
41
+ value: 3.88
42
  ---
43
 
44
  # Wav2Vec2-Large-960h-Lv60 + Self-Training
 
85
  transcription = processor.batch_decode(predicted_ids)
86
  ```
87
 
88
+ ## Evaluation
89
 
90
+ This code snippet shows how to evaluate **facebook/wav2vec2-large-960h-lv60-self** on LibriSpeech's "clean" and "other" test data.
91
 
92
  ```python
93
  from datasets import load_dataset
 
103
 
104
  def map_to_pred(batch):
105
  inputs = processor(batch["audio"]["array"], return_tensors="pt", padding="longest")
106
+ input_values = inputs.input_values.to s("cuda")
107
  attention_mask = inputs.attention_mask.to("cuda")
108
 
109
  with torch.no_grad():
110
  logits = model(input_values, attention_mask=attention_mask).logits
111
 
112
  predicted_ids = torch.argmax(logits, dim=-1)
113
+ transcription = processor.batch_decode(predicted_ids)[0]
114
  batch["transcription"] = transcription
115
  return batch
116
 
 
123
 
124
  | "clean" | "other" |
125
  |---|---|
126
+ | 1.86 | 3.88 |