Error encountered when fine-tuning

#30
by yongleyuan - opened

When using AutoProcessor.from_pretrained("meta-llama/Llama-3.2-11B-Vision") to process batches for finetuning, an error occurred with the following message:

RuntimeError: The expanded size of the tensor (4128) must match the existing size (3096) at non-singleton dimension 3.  Target sizes: [4, 16, 4128, 4128].  Tensor sizes: [4, 1, 3096, 3096]

The error traces show that the error comes from modeling_mllama.py's forward function:

attn_output = F.scaled_dot_product_attention(query, key, value, attn_mask=attention_mask)

Looks like there's something wrong with the attention mask, but it is obtained directly from the loaded processor.
Any idea what causes the error and how I can fix it?
Happy to provide more details, thanks!

Meta Llama org

Hi @yongleyuan excited to see what you are building with the models!

Could you kindly use the MllamaProcessor class please? Kindly see the model card for reference or here.

We also provide a fine-tuning implementation here for reference.

Hi! Has this problem been solved? I occur in the same error when I try to infer a fine-tuned llama3.2-vision-11b model.

Using MllamaProcessor instead of AutoProcessor did resolve this issue (thanks to @Sanyam 's suggestion), but I also identified some other bugs in my code. I am not sure if AutoProcessor is the direct cause of this issue.

Sign up or log in to comment