The new Longform transcription method

#76
by deep-intel - opened

Is there an option to generate word-level timestamps with the new promising long-form transcription akin to what we have with the pipeline?

https://github.com/huggingface/transformers/pull/27658

Any updates on this?

@patrickvonplaten
Your thoughts please on this since you did pivotal work on this PR

I probably won't find time for this - can we open a feature request on Transformers?

cc @sanchit-gandhi as well

Quick question @patrickvonplaten @sanchit-gandhi - with the new method, do we load the full audio in (GPU) memory in one go? If yes, I guess that is different from how "pipeline" would have handled it? The reason I ask is - I could process very long audio with the pipeline, but an audio of just about 30 min ran out of memory with new long form transcription.

Sign up or log in to comment