Removes annoying warning: missing "sampling_rate" argument

by andreagasparini - opened Jul 6, 2022

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

-1

andreagasparini

Jul 6, 2022

•

edited Jul 7, 2022

Hi, I was trying to evaluate the model on LibriSpeech's "clean" and "other" test data following the code snippet in the Model card and I got an annoying warning about the missing sampling_rate argument in the processor call, printed at every call of the map_to_pred function.

"It is strongly recommended to pass the sampling_rate argument to this function. Failing to do so can result in silent errors that might be hard to debug."

I have changed the snippet by simply inferring the sr from the dataset itself. I think the same argument is missing in similar Model cards as well (e.g. facebook/wav2vec2-large-960h-lv60).

Removes annoying warning: missing "sampling_rate" argument750319d4

sanchit-gandhi

Jul 28, 2022

•

edited Jul 28, 2022

Hey @andreagasparini , for LS the sampling_rate is 16kHz which is equal to the sampling rate of the W2V2 feature extractor. In this case, passing the sampling rate of the raw dataset (16kHz) to the processor is completely valid. In the case that the sampling_rate of the dataset != 16kHz, we would have to resample:

# compute sample rate of dataset
dataset_sampling_rate = next(iter(raw_datasets.values())).features["audio"].sampling_rate

# check if sample rates match
if dataset_sampling_rate != processor.feature_extractor.sampling_rate:
    # resample if necessary
    raw_datasets = raw_datasets.cast_column(
           "audio", datasets.features.Audio(sampling_rate=processor.feature_extractor.sampling_rate)
        )
    )

I'd be happy to merge this PR provided we prepend your proposed change with a one line comment explaining why passing the sampling rate is valid:

...
# LibriSpeech sampling rate (16kHz) is equal to Wav2Vec2 processor sampling rate -> pass audio directly to processor
inputs = processor(batch["audio"]["array"], return_tensors="pt", padding="longest", sampling_rate=batch["audio"]["sampling_rate"])
...

Let me know what you think!

andreagasparini

Aug 9, 2022

•

edited Aug 9, 2022

Hi @sanchit-gandhi , indeed you are right!
I directly put it like that since the code snippet referst to "how to evaluate facebook/wav2vec2-large-960h-lv60-self on LibriSpeech's "clean" and "other" test data". Anyway I totally agree with adding your comment, explicitly saying why we do so can not hurt.
Maybe I would also add there that if the sampling rates do not match we have to resample before using the model.

sanchit-gandhi

Aug 9, 2022

•

edited Aug 9, 2022

Hey @andreagasparini ,

That'd be great, and would certainly help in avoiding preventable errors when the feature extractor sampling rate != audio sampling rate!

Thank you!

Adds comment explaination of when to resampled779cee8

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Ready to merge

This branch is ready to get merged automatically.

· Sign up or log in to comment