Noise Level

#9
by MemberDS - opened

I am using Whisper Large Speech Recognition model for voice related to ATC (air traffic controller) and RT (radio transmission). It has been observed that till certain level of noise in audio it perform well, but if the noise level is increased then the performance get erroneous. I've two question related to this:

  1. What are the threshold in decibel of noise for which the model is trained,
  2. How to reduce the noise level of original file and increase the accuracy of noisy file.

Hey @MemberDS ! Sorry about the late reply here, that's a super interesting question regarding Whisper noise level. There are no details about the level of noise on which the model is trained on, but you can find details about the performance of the model under noise in Section 3.7 of the paper https://arxiv.org/pdf/2212.04356.pdf

We recommend normalising the audio before passing it through the Whisper model (see https://huggingface.co/docs/transformers/model_doc/whisper#transformers.WhisperFeatureExtractor.__call__.do_normalize and https://github.com/huggingface/transformers/issues/19888)

This package a provides a Python port of the Audacity noise reduction algorithm https://pypi.org/project/noisereduce/
You can try applying this to your audio to pre-process it and reduce the overall input noise

Sign up or log in to comment