How to fine tuning it with whisper medium

#3
by ken0997 - opened

Hi, I am having an error "runtimeerror: given groups=1, weight of size [768, 80, 3], expected input[8, 70, 3000] to have 80 channels, but got 70 channels instead" while trying to fine tuning whisper medium model with this dataset "https://huggingface.co/datasets/seanghay/khmer-speech-large"

Maybe it's related to max text length and audio length. You may need to limit those.

ok, thanks. I will try on that

Maybe it's related to max text length and audio length. You may need to limit those.

How to limit them? I have a prepare datatset function here, which refering to the code https://github.com/vb100/whisper_ai_finetune/blob/main/whisper_finetuning.py
def prepare_dataset(batch):
"""
Prepare audio data to be suitable for Whisper AI model.
"""
# (1) load and resample audio data from 48 to 16kHz
audio = batch["audio"]

# (2) compute log-Mel input features from input audio array
batch["input_features"] = feature_extractor(audio["array"], sampling_rate=audio["sampling_rate"]).input_features[0]

# (3) encode target text to label ids
batch["labels"] = tokenizer(batch["sentence"]).input_ids
return batch

will it affect the accuracy of model trained as the audio is cropped?

Yes, it will. This is my code that I used to train the model. https://github.com/seanghay/whisper-tiny-khmer

ok thanks

I tried and tested by training only 8 steps, with tiny model and the datasets which replaced "seanghay/km-augmented-16-combined" to "seanghay/km-speech-corpus", and when i do transcribing with khmer langugae audio, its output showed in this " Sù sế rịch rí đã ban chục nè nè sốc sắp bài tây." instead of khmer words like this "សួស្តី រីករាយ ដែល បាន ជួប អ្នក អ្នក សុខ សប្បាយ ទេ", why?

Sign up or log in to comment