hf-internal-testing/pixtral-12b

weights shape mismatch

Script:

from transformers import LlavaForConditionalGeneration, AutoProcessor
from PIL import Image

model_id = "hf-internal-testing/pixtral-12b"
model = LlavaForConditionalGeneration.from_pretrained(model_id)
processor = AutoProcessor.from_pretrained(model_id)

IMG_URLS = [
    "https://picsum.photos/id/237/400/300",
    "https://picsum.photos/id/231/200/300",
    "https://picsum.photos/id/27/500/500",
    "https://picsum.photos/id/17/150/600",
]
PROMPT = "<s>[INST]Describe the images.\n[IMG][IMG][IMG][IMG][/INST]"

inputs = processor(images=IMG_URLS, text=PROMPT, return_tensors="pt")
generate_ids = model.generate(**inputs, max_new_tokens=500)
ouptut = processor.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]

Error:

Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:30<00:00,  5.16s/it]
Traceback (most recent call last):
  File "/Users/varb/exo/test.py", line 70, in <module>
    model = LlavaForConditionalGeneration.from_pretrained(model_id)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/varb/transformers/src/transformers/modeling_utils.py", line 3976, in from_pretrained
    ) = cls._load_pretrained_model(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/varb/transformers/src/transformers/modeling_utils.py", line 4511, in _load_pretrained_model
    raise RuntimeError(f"Error(s) in loading state_dict for {model.__class__.__name__}:\n\t{error_msg}")
RuntimeError: Error(s) in loading state_dict for LlavaForConditionalGeneration:
        size mismatch for language_model.model.layers.0.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.0.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.0.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.0.self_attn.o_proj.weight: copying a param with shape torch.Size([5120, 4096]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.1.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.1.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.1.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.1.self_attn.o_proj.weight: copying a param with shape torch.Size([5120, 4096]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.2.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.2.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.2.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.2.self_attn.o_proj.weight: copying a param with shape torch.Size([5120, 4096]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.3.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.3.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.3.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.3.self_attn.o_proj.weight: copying a param with shape torch.Size([5120, 4096]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.4.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.4.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.4.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.4.self_attn.o_proj.weight: copying a param with shape torch.Size([5120, 4096]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.5.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.5.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.5.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.5.self_attn.o_proj.weight: copying a param with shape torch.Size([5120, 4096]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.6.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.6.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.6.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.6.self_attn.o_proj.weight: copying a param with shape torch.Size([5120, 4096]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.7.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.7.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.7.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.7.self_attn.o_proj.weight: copying a param with shape torch.Size([5120, 4096]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.8.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.8.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.8.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.8.self_attn.o_proj.weight: copying a param with shape torch.Size([5120, 4096]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.9.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.9.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.9.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.9.self_attn.o_proj.weight: copying a param with shape torch.Size([5120, 4096]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.10.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.10.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.10.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.10.self_attn.o_proj.weight: copying a param with shape torch.Size([5120, 4096]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.11.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.11.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.11.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.11.self_attn.o_proj.weight: copying a param with shape torch.Size([5120, 4096]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.12.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.12.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.12.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.12.self_attn.o_proj.weight: copying a param with shape torch.Size([5120, 4096]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.13.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.13.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.13.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.13.self_attn.o_proj.weight: copying a param with shape torch.Size([5120, 4096]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.14.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.14.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.14.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.14.self_attn.o_proj.weight: copying a param with shape torch.Size([5120, 4096]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.15.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.15.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.15.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.15.self_attn.o_proj.weight: copying a param with shape torch.Size([5120, 4096]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.16.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.16.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.16.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.16.self_attn.o_proj.weight: copying a param with shape torch.Size([5120, 4096]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.17.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.17.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.17.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.17.self_attn.o_proj.weight: copying a param with shape torch.Size([5120, 4096]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.18.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.18.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.18.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.18.self_attn.o_proj.weight: copying a param with shape torch.Size([5120, 4096]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.19.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.19.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.19.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.19.self_attn.o_proj.weight: copying a param with shape torch.Size([5120, 4096]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.20.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.20.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.20.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.20.self_attn.o_proj.weight: copying a param with shape torch.Size([5120, 4096]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.21.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.21.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.21.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.21.self_attn.o_proj.weight: copying a param with shape torch.Size([5120, 4096]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.22.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.22.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.22.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.22.self_attn.o_proj.weight: copying a param with shape torch.Size([5120, 4096]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.23.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.23.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.23.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.23.self_attn.o_proj.weight: copying a param with shape torch.Size([5120, 4096]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.24.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.24.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.24.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.24.self_attn.o_proj.weight: copying a param with shape torch.Size([5120, 4096]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.25.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.25.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.25.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.25.self_attn.o_proj.weight: copying a param with shape torch.Size([5120, 4096]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.26.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.26.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.26.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.26.self_attn.o_proj.weight: copying a param with shape torch.Size([5120, 4096]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.27.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.27.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.27.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.27.self_attn.o_proj.weight: copying a param with shape torch.Size([5120, 4096]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.28.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.28.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.28.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.28.self_attn.o_proj.weight: copying a param with shape torch.Size([5120, 4096]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.29.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.29.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.29.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.29.self_attn.o_proj.weight: copying a param with shape torch.Size([5120, 4096]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.30.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.30.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.30.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.30.self_attn.o_proj.weight: copying a param with shape torch.Size([5120, 4096]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.31.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.31.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.31.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.31.self_attn.o_proj.weight: copying a param with shape torch.Size([5120, 4096]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.32.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.32.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.32.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.32.self_attn.o_proj.weight: copying a param with shape torch.Size([5120, 4096]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.33.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.33.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.33.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.33.self_attn.o_proj.weight: copying a param with shape torch.Size([5120, 4096]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.34.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.34.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.34.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.34.self_attn.o_proj.weight: copying a param with shape torch.Size([5120, 4096]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.35.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.35.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.35.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.35.self_attn.o_proj.weight: copying a param with shape torch.Size([5120, 4096]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.36.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.36.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.36.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.36.self_attn.o_proj.weight: copying a param with shape torch.Size([5120, 4096]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.37.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.37.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.37.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.37.self_attn.o_proj.weight: copying a param with shape torch.Size([5120, 4096]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.38.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.38.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.38.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.38.self_attn.o_proj.weight: copying a param with shape torch.Size([5120, 4096]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.39.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        size mismatch for language_model.model.layers.39.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.39.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for language_model.model.layers.39.self_attn.o_proj.weight: copying a param with shape torch.Size([5120, 4096]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
        You may consider adding `ignore_mismatched_sizes=True` in the model `from_pretrained` method.

Updated Script:

from transformers import LlavaForConditionalGeneration, AutoProcessor
from PIL import Image

model_id = "hf-internal-testing/pixtral-12b"
model = LlavaForConditionalGeneration.from_pretrained(model_id)
processor = AutoProcessor.from_pretrained(model_id)

IMG_URLS = [
    "https://picsum.photos/id/237/400/300",
    "https://picsum.photos/id/231/200/300",
    "https://picsum.photos/id/27/500/500",
    "https://picsum.photos/id/17/150/600",
]
PROMPT = "<s>[INST]Describe the images.\n[IMG][IMG][IMG][IMG][/INST]"

inputs = processor(images=IMG_URLS, text=PROMPT, return_tensors="pt")
generate_ids = model.generate(input_ids=inputs["input_ids"], pixel_values=inputs["pixel_values"][0], attention_mask=inputs["attention_mask"], max_new_tokens=500)
ouptut = processor.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]

hf-internal-testing
/

pixtral-12b

config_update