Removes annoying warning: missing "sampling_rate" argument

#1
by andreagasparini - opened

Hi, I was trying to evaluate the model on LibriSpeech's "clean" and "other" test data following the code snippet in the Model card and I got an annoying warning about the missing sampling_rate argument in the processor call, printed at every call of the map_to_pred function.

"It is strongly recommended to pass the sampling_rate argument to this function. Failing to do so can result in silent errors that might be hard to debug."

I have changed the snippet by simply inferring the sr from the dataset itself. I think the same argument is missing in similar Model cards as well (e.g. facebook/wav2vec2-large-960h-lv60).

Hey @andreagasparini , for LS the sampling_rate is 16kHz which is equal to the sampling rate of the W2V2 feature extractor. In this case, passing the sampling rate of the raw dataset (16kHz) to the processor is completely valid. In the case that the sampling_rate of the dataset != 16kHz, we would have to resample:

# compute sample rate of dataset
dataset_sampling_rate = next(iter(raw_datasets.values())).features["audio"].sampling_rate

# check if sample rates match
if dataset_sampling_rate != processor.feature_extractor.sampling_rate:
    # resample if necessary
    raw_datasets = raw_datasets.cast_column(
           "audio", datasets.features.Audio(sampling_rate=processor.feature_extractor.sampling_rate)
        )
    )

I'd be happy to merge this PR provided we prepend your proposed change with a one line comment explaining why passing the sampling rate is valid:

...
# LibriSpeech sampling rate (16kHz) is equal to Wav2Vec2 processor sampling rate -> pass audio directly to processor
inputs = processor(batch["audio"]["array"], return_tensors="pt", padding="longest", sampling_rate=batch["audio"]["sampling_rate"])
...

Let me know what you think!

Hi @sanchit-gandhi , indeed you are right!
I directly put it like that since the code snippet referst to "how to evaluate facebook/wav2vec2-large-960h-lv60-self on LibriSpeech's "clean" and "other" test data". Anyway I totally agree with adding your comment, explicitly saying why we do so can not hurt.
Maybe I would also add there that if the sampling rates do not match we have to resample before using the model.

Hey @andreagasparini ,

That'd be great, and would certainly help in avoiding preventable errors when the feature extractor sampling rate != audio sampling rate!

Thank you!

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment