Time-codes from whisper

#71
by EranML - opened

is it possible to get the time codes of each word of the generated text from whisper?

Hey @EranML ! You can get the time stamps of each segment: https://huggingface.co/openai/whisper-large-v2#long-form-transcription

For word-level time stamps, you can check out WhisperX: https://github.com/m-bain/whisperX

@EranML , The latest whisper version (20230314) supports word-level timestamps and word-level posteriors. (See the --word_timestamps option, and set it to True.)

We're looking to add this to transformers too :)

@sanchit-gandhi , Any idea when word level timestamps will be added to transformers?

Hi All - also noticed that the milliseconds part of the timestamps are rounded off leading to premature cut-offs, if using the audio segments:

00:00:00.000 --> 00:00:05.000: There's so many things here, and in my house, that people are always saying,
00:00:05.000 --> 00:00:07.000: where did you get that? And I'm like, I don't know.

Is there a way to turn off the rounding, so we get the actual milliseconds? Thank you for help.

~~
Update: for anyone needing to have increased resolution on timestamps, I found this library, and it works great in stabilizing the milliseconds portion of the VTT timecodes: https://pypi.org/project/stable-ts/

Sign up or log in to comment