Support for multiple images..

#19
by wamozart - opened

I'm trying to pass multiple images in the prompt and ask the model to find the differences between these two models.
image1 = Image.open(requests.get(url1, stream=True).raw)
image2 = Image.open(requests.get(url1, stream=True).raw)
images = (image1, image2)
prompt = """
[INST] \nYou are giving two images of , determine if these are the same image or not and the reason. [/INST]
"""
It seems to ignore the second image. Any suggestion?

Llava Hugging Face org

Hey!

Yes, LLaVa-NeXT can accept multiple images as input as shown here. But since the model was not pre-trained with several images interleaved in one prompt, it might not perform well.

I recommend to fine-tune it for your use case, if you want decent quality in generating based on several images.

How should i use this model to generate captions for 3 millions images, like what resources to use(where to solve)? what will be the cost computation? what parallelizations to use?

Llava Hugging Face org

@LBS-LENKA you can either use TGI to serve it which comes with many optimizations under the hood: https://github.com/huggingface/text-generation-inference
I'm also building this project to optimize vision/multimodal models that you can find recipes inside depending on your hardware: https://github.com/merveenoyan/smol-vision

Hi, I was trying out the example given here: https://huggingface.co/docs/transformers/main/en/model_doc/llava_next#multi-image-inference

But I am getting an error while trying to apply chat template. Below are the code and the error:

Code:

import torch
from transformers import LlavaNextProcessor, LlavaNextForConditionalGeneration, AutoProcessor, AutoTokenizer
from PIL import Image
import requests

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
processor = LlavaNextProcessor.from_pretrained("llava-hf/llava-v1.6-mistral-7b-hf")
model = LlavaNextForConditionalGeneration.from_pretrained(
    "llava-hf/llava-v1.6-mistral-7b-hf",
    torch_dtype=torch.float16,
    low_cpu_mem_usage=True
)
model.to(device)

url = "https://www.ilankelman.org/stopsigns/australia.jpg"
image_stop = Image.open(requests.get(url, stream=True).raw)

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image_cats = Image.open(requests.get(url, stream=True).raw)

url = "https://huggingface.co/microsoft/kosmos-2-patch14-224/resolve/main/snowman.jpg"
image_snowman = Image.open(requests.get(url, stream=True).raw)

# Prepare a batch of two prompts, where the first one is a multi-turn conversation and the second is not
conversation_1 = [
    {
        "role": "user",
        "content": [
            {"type": "image"},
            {"type": "text", "text": "What is shown in this image?"},
            ],
    },
    {
        "role": "assistant",
        "content": [
            {"type": "text", "text": "There is a red stop sign in the image."},
            ],
    },
    {
        "role": "user",
        "content": [
            {"type": "image"},
            {"type": "text", "text": "What about this image? How many cats do you see?"},
            ],
    },
]

conversation_2 = [
    {
        "role": "user",
        "content": [
            {"type": "image"},
            {"type": "text", "text": "What is shown in this image?"},
            ],
    },
]

prompt_1 = processor.apply_chat_template(conversation_1, add_generation_prompt=True)
prompt_2 = processor.apply_chat_template(conversation_2, add_generation_prompt=True)
prompts = [prompt_1, prompt_2]

Error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[17], line 46
     13 conversation_1 = [
     14     {
     15         "role": "user",
   (...)
     33     },
     34 ]
     36 conversation_2 = [
     37     {
     38         "role": "user",
   (...)
     43     },
     44 ]
---> 46 prompt_1 = processor.apply_chat_template(conversation_1, add_generation_prompt=True)
     47 prompt_2 = processor.apply_chat_template(conversation_2, add_generation_prompt=True)
     48 prompts = [prompt_1, prompt_2]

File /opt/conda/lib/python3.10/site-packages/transformers/processing_utils.py:926, in ProcessorMixin.apply_chat_template(self, conversation, chat_template, tokenize, **kwargs)
    924         chat_template = self.default_chat_template
    925     else:
--> 926         raise ValueError(
    927             "No chat template is set for this processor. Please either set the `chat_template` attribute, "
    928             "or provide a chat template as an argument. See "
    929             "https://huggingface.co/docs/transformers/main/en/chat_templating for more information."
    930         )
    931 return self.tokenizer.apply_chat_template(
    932     conversation, chat_template=chat_template, tokenize=tokenize, **kwargs
    933 )

ValueError: No chat template is set for this processor. Please either set the `chat_template` attribute, or provide a chat template as an argument. See https://huggingface.co/docs/transformers/main/en/chat_templating for more information.
Llava Hugging Face org

@biswadeep49 which version of transformers do you have? You need at least v4.43 for chat templates, that is when we added support for it

Hi, I also met the same issue with ValueError: No chat template is set for this processor. Please either set the chat_template attribute, or provide a chat template as an argument. See https://huggingface.co/docs/transformers/main/en/chat_templating for more information.
I use transformers 4.45.0. Any suggestion?
Thank you.

Llava Hugging Face org

@zcchen I just verified that the templates work in the latest version from main and the latest patch release. If you're on a jupyter notebook, you might need to restart the kernel. It happends sometimes that the package isn't updated until the kernel restarts

Also, I recommend to use v4.44.2 for now, as the version on main branch in under refactoring and might give some errors. I am working on it, but the PR is not merged yet

the error "No chat template is set for this processor" still exists when using v4.44.2

Sign up or log in to comment