Mismatching WER and samll BUGs

by yourex - opened Sep 16, 2023

Sep 16, 2023

Hi, I got some issues while running the sample code. When I directly ran it, I got the error as following:

TypeError                                 Traceback (most recent call last)
     19     batch["transcription"] = transcription
     20     return batch
---> 22 result = librispeech_eval.map(map_to_pred, batched=True, batch_size=1, remove_columns=["audio"])
     24 #flattened_list = [item[0] for item in result["transcription"]]
     26 print("WER:", wer(result["text"], result["transcription"]))

File \datasets\arrow_dataset.py:592, in transmit_tasks.<locals>.wrapper(*args, **kwargs)
    590     self: "Dataset" = kwargs.pop("self")
    591 # apply actual function
--> 592 out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
    593 datasets: List["Dataset"] = list(out.values()) if isinstance(out, dict) else [out]
    594 for dataset in datasets:
    595     # Remove task templates if a column mapping of the template is no longer valid

File \datasets\arrow_dataset.py:557, in transmit_format.<locals>.wrapper(*args, **kwargs)
    550 self_format = {
    551     "type": self._format_type,
    552     "format_kwargs": self._format_kwargs,
    553     "columns": self._format_columns,
    554     "output_all_columns": self._output_all_columns,
    555 }
    556 # apply actual function
...
---> 13     input_values = processor(batch["audio"]["array"], return_tensors="pt", padding="longest").input_values
     14     with torch.no_grad():
     15         logits = model(input_values.to("cuda")).logits

TypeError: list indices must be integers or slices, not str

So I modified the sample code as:

result = librispeech_eval.map(map_to_pred, batched=False, batch_size=1, remove_columns=["audio"])

flattened_list = [item[0] for item in result["transcription"]]

print("WER:", wer(result["text"], flattened_list))

Then I got a mismatch WER as:

It is strongly recommended to pass the ``sampling_rate`` argument to this function. Failing to do so can result in silent errors that might be hard to debug.
WER: 0.0338557516737675

Is this caused by not passing the sampling rate?

lysandre

Sep 18, 2023

Would you know what might be happening @sanchit-gandhi ?

yourex

Sep 19, 2023

The sample code I ran is this one:

Evaluation
This code snippet shows how to evaluate facebook/wav2vec2-base-960h on LibriSpeech's "clean" and "other" test data.

from datasets import load_dataset
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
import torch
from jiwer import wer


librispeech_eval = load_dataset("librispeech_asr", "clean", split="test")

model = Wav2Vec2ForCTC.from_pretrained("facebook/wav2vec2-base-960h").to("cuda")
processor = Wav2Vec2Processor.from_pretrained("facebook/wav2vec2-base-960h")

def map_to_pred(batch):
    input_values = processor(batch["audio"]["array"], return_tensors="pt", padding="longest").input_values
    with torch.no_grad():
        logits = model(input_values.to("cuda")).logits

    predicted_ids = torch.argmax(logits, dim=-1)
    transcription = processor.batch_decode(predicted_ids)
    batch["transcription"] = transcription
    return batch

result = librispeech_eval.map(map_to_pred, batched=True, batch_size=1, remove_columns=["audio"])

print("WER:", wer(result["text"], result["transcription"]))

sanchit-gandhi

Sep 19, 2023

Hey @yourex ! You're correct in that the code-snippet is broken. I've opened a PR to correct the codesnippet and activate batching here: #10 (and for the large model here: PR 4, and self-trained model here: PR 6). Perhaps @patrickvonplaten could merge these PRs? (Wav2Vec2 is maintained entirely by HF on the Hub)

The WER you calculated with your modified codesnippet is entirely correct. You obtained a WER of 0.03385, or 3.4%, which matches the expected results (see bottom of section on evaluation). The sampling rate of LibriSpeech is 16kHz, which matches the sampling rate of the Wav2Vec2 feature extractor, so in this case there are no pre-processing errors. However, it is good practice to pass the sampling rate to prevent silent errors, as is done on the update codesnippet.

yourex changed discussion status to closed Sep 24, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment