Repeatedly decoding tokens after PEFT fine-tuning whisper

#85
by louislau1129 - opened

Hi, first, thank you for open-sourcing such a good ASR project. Recently I plan to investigate whisper in my research. LoRA was applied to parameter-efficient fine-tuned on my dataset (30h Chinese Mandarin speech corpus). Before fine-tuning, the whisper can achieve about 10% WER. However, after fine-tuning, its decoding seems to have some problems that repeatedly output some tokens multiple times.

It looks like this:
image.png

The ground-truth text is as follows:
image.png

Below are some of my code snippets for configuration. batch_size=2, num_train_epochs=3, fp16=true.

        # lora config
        config = LoraConfig(r=32, lora_alpha=64, target_modules=["q_proj", "v_proj"], lora_dropout=0.05, bias="none")
        # training_args
        training_args = Seq2SeqTrainingArguments(
            output_dir=args.output_dir,  # change to a repo name of your choice
            per_device_train_batch_size=batch_size,
            gradient_accumulation_steps=16//batch_size,  # increase by 2x for every 2x decrease in batch size
            gradient_checkpointing=args.gradient_checkpoint,
            learning_rate=1e-3,
            warmup_steps=50,
            num_train_epochs=3,
            evaluation_strategy="epoch",
            fp16=fp16,
            per_device_eval_batch_size=16,
            eval_accumulation_steps=1, # otherwise will accumulate in GPU, OOM warning!
            generation_max_length=128,
            logging_steps=25,
            remove_unused_columns=False,  # required as the PeftModel forward doesn't have the signature of the wrapped model's forward
            label_names=["labels"],  # same reason as above
            report_to=['tensorboard'],
        )

It would be much appreciated if anyone has any idea about this issue. Or please let me know if you need any more info/clues, . Thanks!

I just found that I missed adding <|endoftext|> at the end of each sentence as I use the tokenizer with the argument add_special_tokens=False. Without this special end token, the model will not know when/where to end after fine-tuning. Now, it works as normal.

Awesome @louislau1129 , great find ๐Ÿ™Œ

sanchit-gandhi changed discussion status to closed

Sign up or log in to comment