meta-llama/Llama-3.2-11B-Vision · Error encountered when fine-tuning

Sep 29, 2024

•

edited Sep 29, 2024

When using AutoProcessor.from_pretrained("meta-llama/Llama-3.2-11B-Vision") to process batches for finetuning, an error occurred with the following message:

RuntimeError: The expanded size of the tensor (4128) must match the existing size (3096) at non-singleton dimension 3.  Target sizes: [4, 16, 4128, 4128].  Tensor sizes: [4, 1, 3096, 3096]

The error traces show that the error comes from modeling_mllama.py's forward function:

attn_output = F.scaled_dot_product_attention(query, key, value, attn_mask=attention_mask)

Looks like there's something wrong with the attention mask, but it is obtained directly from the loaded processor.
Any idea what causes the error and how I can fix it?
Happy to provide more details, thanks!

Sanyam

Meta Llama org Sep 29, 2024

Hi @yongleyuan excited to see what you are building with the models!

Could you kindly use the MllamaProcessor class please? Kindly see the model card for reference or here.

We also provide a fine-tuning implementation here for reference.

xuyifan

Oct 14, 2024

Hi! Has this problem been solved? I occur in the same error when I try to infer a fine-tuned llama3.2-vision-11b model.

yongleyuan

Oct 14, 2024

Using MllamaProcessor instead of AutoProcessor did resolve this issue (thanks to @Sanyam 's suggestion), but I also identified some other bugs in my code. I am not sure if AutoProcessor is the direct cause of this issue.