The expanded size of the tensor (625) must match the existing size (129) at non-singleton dimension 3.

#5
by Koshti10 - opened

Hello. I am trying to run the code provided in the model card and it is throwing the above error

python - 3.10
transformers - 4.39.1

CODE

import requests
from PIL import Image

import torch
from transformers import AutoProcessor, VipLlavaForConditionalGeneration

model_id = "llava-hf/vip-llava-7b-hf"

question = "What are these?"
prompt = f"A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.###Human: \n{question}###Assistant:"

image_file = "http://images.cocodataset.org/val2017/000000039769.jpg"

model = VipLlavaForConditionalGeneration.from_pretrained(
model_id,
torch_dtype=torch.float16,
low_cpu_mem_usage=True,
).to(0)

processor = AutoProcessor.from_pretrained(model_id)

raw_image = Image.open(requests.get(image_file, stream=True).raw)
inputs = processor(prompt, raw_image, return_tensors='pt').to(0, torch.float16)

output = model.generate(**inputs, max_new_tokens=200, do_sample=False)
print(processor.decode(output[0][2:], skip_special_tokens=True))

FULL ERROR

Traceback (most recent call last):
File "/project/kkoshti/clembench/../inferences/vipllava.py", line 26, in
output = model.generate(**inputs, max_new_tokens=200, do_sample=False)
File "/project/kkoshti/envs/clem/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/project/kkoshti/envs/clem/lib/python3.10/site-packages/transformers/generation/utils.py", line 1527, in generate
result = self._greedy_search(
File "/project/kkoshti/envs/clem/lib/python3.10/site-packages/transformers/generation/utils.py", line 2411, in _greedy_search
outputs = self(
File "/project/kkoshti/envs/clem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/project/kkoshti/envs/clem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/project/kkoshti/envs/clem/lib/python3.10/site-packages/transformers/models/vipllava/modeling_vipllava.py", line 473, in forward
outputs = self.language_model(
File "/project/kkoshti/envs/clem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/project/kkoshti/envs/clem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/project/kkoshti/envs/clem/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1196, in forward
outputs = self.model(
File "/project/kkoshti/envs/clem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/project/kkoshti/envs/clem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/project/kkoshti/envs/clem/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1016, in forward
layer_outputs = decoder_layer(
File "/project/kkoshti/envs/clem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/project/kkoshti/envs/clem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/project/kkoshti/envs/clem/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 739, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/project/kkoshti/envs/clem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/project/kkoshti/envs/clem/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/project/kkoshti/envs/clem/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 670, in forward
attn_output = torch.nn.functional.scaled_dot_product_attention(
RuntimeError: The expanded size of the tensor (625) must match the existing size (129) at non-singleton dimension 3. Target sizes: [1, 32, 1, 625]. Tensor sizes: [1, 1, 1, 129]

The same type of error is produced after using the pipeline code as well.

I encountered the same error.
Downgrading transformers to 4.38.2 can solve it.

Llava Hugging Face org

Feel free to open an issue on the Transformers library with code to reproduce it along with your environment settings

Sign up or log in to comment