How to fine tuning it with whisper medium
Hi, I am having an error "runtimeerror: given groups=1, weight of size [768, 80, 3], expected input[8, 70, 3000] to have 80 channels, but got 70 channels instead" while trying to fine tuning whisper medium model with this dataset "https://huggingface.co/datasets/seanghay/khmer-speech-large"
Maybe it's related to max text length and audio length. You may need to limit those.
ok, thanks. I will try on that
Maybe it's related to max text length and audio length. You may need to limit those.
How to limit them? I have a prepare datatset function here, which refering to the code https://github.com/vb100/whisper_ai_finetune/blob/main/whisper_finetuning.py
def prepare_dataset(batch):
"""
Prepare audio data to be suitable for Whisper AI model.
"""
# (1) load and resample audio data from 48 to 16kHz
audio = batch["audio"]
# (2) compute log-Mel input features from input audio array
batch["input_features"] = feature_extractor(audio["array"], sampling_rate=audio["sampling_rate"]).input_features[0]
# (3) encode target text to label ids
batch["labels"] = tokenizer(batch["sentence"]).input_ids
return batch
will it affect the accuracy of model trained as the audio is cropped?
Yes, it will. This is my code that I used to train the model. https://github.com/seanghay/whisper-tiny-khmer
ok thanks
I tried and tested by training only 8 steps, with tiny model and the datasets which replaced "seanghay/km-augmented-16-combined" to "seanghay/km-speech-corpus", and when i do transcribing with khmer langugae audio, its output showed in this " Sù sế rịch rí đã ban chục nè nè sốc sắp bài tây." instead of khmer words like this "សួស្តី រីករាយ ដែល បាន ជួប អ្នក អ្នក សុខ សប្បាយ ទេ", why?