Image-Text-to-Text
Transformers
Safetensors
English
idefics2
pretraining
multimodal
vision
Inference Endpoints
5 papers

shape mismatch: value tensor of shape [2320] cannot be broadcast to indexing result of shape [2262]

#52
by yeargun - opened

I am trying the very first example given in the documentation, yet I'm kind of new to deploying LLMs and stuff.

processor = AutoProcessor.from_pretrained("HuggingFaceM4/idefics2-8b")
model = AutoModelForVision2Seq.from_pretrained(
    "HuggingFaceM4/idefics2-8b",
    # torch_dtype=torch.float16,
     # _attn_implementation="flash_attention_2",
    # quantization_config=quantization_config,
)
prompt = processor.apply_chat_template(messages, add_generation_prompt=True)
inputs = processor(text=prompt, images=[image1, image2], return_tensors="pt")
generated_ids = model.generate(**inputs, max_new_tokens=1500)
generated_texts = processor.batch_decode(generated_ids, skip_special_tokens=True)

And getting the following error
shape mismatch: value tensor of shape [2320] cannot be broadcast to indexing result of shape [2262]

HuggingFaceM4 org

hi @yeargun
do you happen to have a full traceback?

HuggingFaceM4 org

same question @sipie800 , can you share a full traceback?

by me the problem has been fixed somehow, but I can't recall it. It might be the transformers version issue.

Sign up or log in to comment