Fine Tuning Results in Empty Predictions

#33
by wall-daniel - opened

Hello, I'm currently trying to fine tune whisper medium on 10 hours of audio in Swahili. Whenever I fine tune, after about an epoch the model starts predicting nothing besides <|endoftext|> tokens. I tried fine tuning whisper large-v3 and whisper small without any problem. The only difference is the base model I'm using and the processor I am using.

Does anyone have any idea why this is happening?

Hey @wall-daniel - that's very strange! Could you share the code you're using to fine-tune the medium checkpoint? I'd be interested in taking a look at how you set the language/task arguments. What we could also do is remove examples of very short length from the training examples, since these are often erroneous examples in the Common Voice dataset. We can add these lines after we apply our preprocessing function to remove such examples:

max_label_length = model.config.max_length
min_label_length = 6  # 5 special tokens (BOS, language, task, notimestamps, EOS) + 1 token minimum

def filter_labels(labels):
    """Filter label sequences longer than max length and shorter than min length"""
    return min_label_length < len(labels) < max_label_length

vectorized_datasets = vectorized_datasets.filter(filter_labels, input_columns=["labels"])

Sign up or log in to comment