Error while running in mps

#5
by drag88 - opened

How can I fix the below error that comes while using mps?

Code :

model_id = "llava-hf/bakLlava-v1-hf"
pipe = pipeline("image-to-text", model=model_id, device='mps', framework='pt')
image = df['Product Image Link'][1000]
max_new_tokens = 200
prompt = "USER: <image>\nWrite a detailed product description for the product in the image for a customer planning to buy this product?\nASSISTANT:"

outputs = pipe(image, prompt=prompt, generate_kwargs={"max_new_tokens": 1000})
Error:
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[79], line 4
      1 max_new_tokens = 200
      2 prompt = "USER: <image>\nWrite a detailed product description for the product in the image for a customer planning to buy this product?\nASSISTANT:"
----> 4 outputs = pipe(image, prompt=prompt, generate_kwargs={"max_new_tokens": 1000})

File ~/miniconda3/envs/imgtotext/lib/python3.9/site-packages/transformers/pipelines/image_to_text.py:111, in ImageToTextPipeline.__call__(self, images, **kwargs)
     83 def __call__(self, images: Union[str, List[str], "Image.Image", List["Image.Image"]], **kwargs):
     84     """
     85     Assign labels to the image(s) passed as inputs.
     86 
   (...)
    109         - **generated_text** (`str`) -- The generated text.
    110     """
--> 111     return super().__call__(images, **kwargs)

File ~/miniconda3/envs/imgtotext/lib/python3.9/site-packages/transformers/pipelines/base.py:1140, in Pipeline.__call__(self, inputs, num_workers, batch_size, *args, **kwargs)
   1132     return next(
   1133         iter(
   1134             self.get_iterator(
   (...)
   1137         )
   1138     )
   1139 else:
...
    315     )
    317 final_embedding[image_to_overwrite] = image_features.contiguous().reshape(-1, embed_dim)
    318 final_attention_mask |= image_to_overwrite

ValueError: The input provided to the model are wrong. The number of image tokens is 1 while the number of image given to the model is 1. This prevents correct indexing and breaks batch generation.
Llava Hugging Face org

Could you share the actual snippet? I cannot run this if I don;t know which dataset you used

I ran into this as well. Seems to be caused by this bug with calling cumsum on a bool tensor: https://github.com/pytorch/pytorch/issues/96614

Llava Hugging Face org

main might have fixed this btw sorry for the late reply

main might have fixed this btw sorry for the late reply

no worries, seems like the issue still persists

Did anyone find a solution to this? I'm also facing this.

I don't think so. Instead I used the llava off Ollama which was a breeze to run

Sign up or log in to comment