Spaces:
Running
on
L4
Is it possible to silence non-verbal parts of an audio?
Hi,
Basically anytime I am not talking it's either silent, some other noise or there is some throat clearing. My audio is noise free so it's quite clear, and I want to keep only the verbal parts, without changing the audio length as its synced to video.
Is there any tool or API that can do this? I tried a few splitter tools online but they failed to remove throat clearing from verbal parts.
I thought maybe I can use the Whisper API here to detect the timestamps of where there is speech and silent any other parts. Is that feasible?
It's about 80 hours of audio (~200 files).
I attached a very small sample if you want to test it:
I tried this code that uses the original whisper API on this audio but it didn't silence the throat clearing part:
https://paste.ofcode.org/Gc9MUy83K9UHATUHPDVyZ4
Thanks a lot in advance.