Fine-tune Whisper on a Local Custom dataset

#75
by hamza94 - opened

I want to fine-tune whisper on a local custom data-set. I want to train it for phenome detection task. The problem is that all finetuning tutorials out there use prebuilt tokenizers, and datasets using common voice. Is there anyone who knows how to fine tune whisper from the ground up? What steps are involved and is it possible to train for phenome detection? Being able to create custom datasets that are ready to train on whisper can be great for the community!

Hey @hamza94 ! I think the easiest thing to do here would be to convert your custom dataset into the format of Hugging Face datasets, see https://huggingface.co/docs/datasets/audio_dataset

Once you've done this, you can replace common_voice with your custom dataset in the fine-tuning blog post and set the correct language for your task - there are no other code changes required! You'll be able to load it using load_dataset and preprocess it using map in exactly the same way.

Let me know if you encounter any issues :)

Sign up or log in to comment