Stop Whisper predicting phone numbers.

#74

by Tyler1992 - opened Sep 29, 2023

Tyler1992

Sep 29, 2023

Is there a way to stop Whisper from predicting a phone number when the audio cuts off mid phone number. I am using the pipeline on english audio. Unfortunately, I cannot share any audio samples.

EXAMPLE:
(Actual Audio) If you want to continue to receive these calls, you must contact us for credit card deposits or other deposit instructions at 1 800-
(Predicted Audio) If you want to continue to receive these calls, you must contact our us for credit card deposits or other deposit instructions at 1 800-564-8989.

sanchit-gandhi

Oct 2, 2023

Hey @Tyler1992 ! Could you use Whisper with word-level timestamps (set return_timestamps="word" when you call the pipeline class), and cut off the transcription at the time segment that corresponds to the end of the audio recording?

Tyler1992

Oct 2, 2023

@sanchit-gandhi I will definitly give this a try. Thanks!

Tyler1992

Oct 2, 2023

•

edited Oct 2, 2023

@sanchit-gandhi Do you have any idea why the predicted word timestamp is 26 seconds after the end of the audio clip?

Total Audio Time for example below is 10.32 seconds
{
'text': ' This is Jack Parsons with Custom Boat Premise. How are you doing today?',
'chunks': [{'text': ' This', 'timestamp': (0.0, 7.76)}, {'text': ' is', 'timestamp': (7.76, 7.94)}, {'text': ' Jack', 'timestamp': (7.94, 8.22)}, {'text': ' Parsons', 'timestamp': (8.22, 8.46)}, {'text': ' with', 'timestamp': (8.46, 8.7)}, {'text': ' Custom', 'timestamp': (36.65, 36.65)}, {'text': ' Boat', 'timestamp': (36.65, 36.65)}, {'text': ' Premise.', 'timestamp': (36.65, 36.65)}, {'text': ' How', 'timestamp': (36.65, 36.65)}, {'text': ' are', 'timestamp': (36.65, 36.65)}, {'text': ' you', 'timestamp': (36.65, 36.65)}, {'text': ' doing', 'timestamp': (36.65, 36.65)}, {'text': ' today?', 'timestamp': (36.65, 36.65)}
]}

sanchit-gandhi

Oct 5, 2023

•

edited Oct 5, 2023

Interesting! Could you share your Transformers version please? You can copy and past the output of:

transformers-cli env

The reason I ask is because we recently made a bug fix to word-level timestamps to fix an issue that looks similar to your results: https://github.com/huggingface/transformers/pull/25607

If you're not on the latest version, I recommend you try updating with:

pip install --upgrade transformers

And re-running your codesnippet to check if you get different results.

Thanks!

Tyler1992

Oct 5, 2023

My transformers version was 4.33.1 and after updating to transformers version 4.34.0 the code works as expected! Whatever they are paying you, it should be more! Thank you very much.

Tyler1992 changed discussion status to closed Oct 5, 2023

sanchit-gandhi

Oct 6, 2023

Awesome to hear it's fixed - all the best with your Whisper endeavours @Tyler1992 !

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment