The new Longform transcription method

by deep-intel - opened

Is there an option to generate word-level timestamps with the new promising long-form transcription akin to what we have with the pipeline?

Any updates on this?

Your thoughts please on this since you did pivotal work on this PR

I probably won't find time for this - can we open a feature request on Transformers?

cc @sanchit-gandhi as well

Quick question @patrickvonplaten @sanchit-gandhi - with the new method, do we load the full audio in (GPU) memory in one go? If yes, I guess that is different from how "pipeline" would have handled it? The reason I ask is - I could process very long audio with the pipeline, but an audio of just about 30 min ran out of memory with new long form transcription.

Sign up or log in to comment