andreagasparini
commited on
Commit
•
d02ba4f
1
Parent(s):
54074b1
Fixes evaluation instructions and updates WER scores
Browse filesHi, I was trying to evaluate the model on LibriSpeech's "clean" and "other" test data following the code snippet in the Model card but I got a `TypeError` due to the transcriptions stored in the batch as wrapped in lists instead of as plain strings (e.g. ["transcription example"] instead of "transcription example") in the `map_to_pred` function.
``TypeError: expected string or bytes-like object``
After fixing the error I recomputed the WER and updated the scores without aproximating them. I think the same should be done for other wav2vec2 based models (e.g. facebook/wav2vec2-large-960h-lv60).
README.md
CHANGED
@@ -24,7 +24,7 @@ model-index:
|
|
24 |
metrics:
|
25 |
- name: Test WER
|
26 |
type: wer
|
27 |
-
value: 1.
|
28 |
- task:
|
29 |
name: Automatic Speech Recognition
|
30 |
type: automatic-speech-recognition
|
@@ -38,7 +38,7 @@ model-index:
|
|
38 |
metrics:
|
39 |
- name: Test WER
|
40 |
type: wer
|
41 |
-
value: 3.
|
42 |
---
|
43 |
|
44 |
# Wav2Vec2-Large-960h-Lv60 + Self-Training
|
@@ -85,9 +85,9 @@ To transcribe audio files the model can be used as a standalone acoustic model a
|
|
85 |
transcription = processor.batch_decode(predicted_ids)
|
86 |
```
|
87 |
|
88 |
-
|
89 |
|
90 |
-
|
91 |
|
92 |
```python
|
93 |
from datasets import load_dataset
|
@@ -103,14 +103,14 @@ processor = Wav2Vec2Processor.from_pretrained("facebook/wav2vec2-large-960h-lv60
|
|
103 |
|
104 |
def map_to_pred(batch):
|
105 |
inputs = processor(batch["audio"]["array"], return_tensors="pt", padding="longest")
|
106 |
-
input_values = inputs.input_values.to("cuda")
|
107 |
attention_mask = inputs.attention_mask.to("cuda")
|
108 |
|
109 |
with torch.no_grad():
|
110 |
logits = model(input_values, attention_mask=attention_mask).logits
|
111 |
|
112 |
predicted_ids = torch.argmax(logits, dim=-1)
|
113 |
-
transcription = processor.batch_decode(predicted_ids)
|
114 |
batch["transcription"] = transcription
|
115 |
return batch
|
116 |
|
@@ -123,4 +123,4 @@ print("WER:", wer(result["text"], result["transcription"]))
|
|
123 |
|
124 |
| "clean" | "other" |
|
125 |
|---|---|
|
126 |
-
| 1.
|
|
|
24 |
metrics:
|
25 |
- name: Test WER
|
26 |
type: wer
|
27 |
+
value: 1.86
|
28 |
- task:
|
29 |
name: Automatic Speech Recognition
|
30 |
type: automatic-speech-recognition
|
|
|
38 |
metrics:
|
39 |
- name: Test WER
|
40 |
type: wer
|
41 |
+
value: 3.88
|
42 |
---
|
43 |
|
44 |
# Wav2Vec2-Large-960h-Lv60 + Self-Training
|
|
|
85 |
transcription = processor.batch_decode(predicted_ids)
|
86 |
```
|
87 |
|
88 |
+
## Evaluation
|
89 |
|
90 |
+
This code snippet shows how to evaluate **facebook/wav2vec2-large-960h-lv60-self** on LibriSpeech's "clean" and "other" test data.
|
91 |
|
92 |
```python
|
93 |
from datasets import load_dataset
|
|
|
103 |
|
104 |
def map_to_pred(batch):
|
105 |
inputs = processor(batch["audio"]["array"], return_tensors="pt", padding="longest")
|
106 |
+
input_values = inputs.input_values.to s("cuda")
|
107 |
attention_mask = inputs.attention_mask.to("cuda")
|
108 |
|
109 |
with torch.no_grad():
|
110 |
logits = model(input_values, attention_mask=attention_mask).logits
|
111 |
|
112 |
predicted_ids = torch.argmax(logits, dim=-1)
|
113 |
+
transcription = processor.batch_decode(predicted_ids)[0]
|
114 |
batch["transcription"] = transcription
|
115 |
return batch
|
116 |
|
|
|
123 |
|
124 |
| "clean" | "other" |
|
125 |
|---|---|
|
126 |
+
| 1.86 | 3.88 |
|