Fine-tunining Whisper models for shorter audio segments

#89
by Malishevsky - opened

Hi all. My project needs to recognize many short audio parts. Can I use fine to change the multilingual model for short audios like 10 seconds ? If not, can I train the model from scratch for these purposes? I would be grateful for any help and hints.

Whisper should perform well for 10 sec audios clips. I don't see the point for doing this, but you can try fine-tuning the model to see if it perform better for shorter audios, here is a colab you can try out:https://colab.research.google.com/github/sanchit-gandhi/notebooks/blob/main/fine_tune_whisper.ipynb. There are things you can try on a code level to improve the accuracy(like VADs) instead of fine-tuning the model

I want a black and white logo of a horse's head with a noble stance and a stern look. It will be located in a circle similar to the base of a bullet casing. Framing the circle will be the text TUGAN TAKTIK.I want a black and white logo of a horse's head with a noble stance and a stern look. It will be located in a circle similar to the base of a bullet casing. Framing the circle will be the text TUGAN TAKTIK.

Sign up or log in to comment