Automatic Speech Recognition
Transformers
Safetensors
Japanese
whisper
audio
hf-asr-leaderboard
Inference Endpoints

Regroup options?

#1
by Dgoryeo - opened

Hi,

Thank you so much for this very promising work.

I just came across your work and did couple of tests. I got quite long trasncription lines and long durations. I am wondering if it would be possible for the user to set the options for the stable_ts' regroup function to allow for shorter transcription lines? Or other ways to break into smaller lines/durations, like allowing to break from space or comma?

Thanks again for your superb work!

Kotoba Technologies org

Hi thanks for the feedback! I have investigated stable-ts and it turns out that the kotoba-whisper models are unable to generate word-level (character-level) timstamp, so stable-ts cannot perform the fine-grained timestamp reorganization. With that being said, stable-ts shouldn't be used as it would just make the timstamp coarser.

asahi417 changed discussion status to closed

Sign up or log in to comment