Support for multiple images..
I'm trying to pass multiple images in the prompt and ask the model to find the differences between these two models.
image1 = Image.open(requests.get(url1, stream=True).raw)
image2 = Image.open(requests.get(url1, stream=True).raw)
images = (image1, image2)
prompt = """
[INST] \nYou are giving two images of , determine if these are the same image or not and the reason. [/INST]
"""
It seems to ignore the second image. Any suggestion?
Hey!
Yes, LLaVa-NeXT can accept multiple images as input as shown here. But since the model was not pre-trained with several images interleaved in one prompt, it might not perform well.
I recommend to fine-tune it for your use case, if you want decent quality in generating based on several images.
How should i use this model to generate captions for 3 millions images, like what resources to use(where to solve)? what will be the cost computation? what parallelizations to use?
@LBS-LENKA
you can either use TGI to serve it which comes with many optimizations under the hood: https://github.com/huggingface/text-generation-inference
I'm also building this project to optimize vision/multimodal models that you can find recipes inside depending on your hardware: https://github.com/merveenoyan/smol-vision
Hi, I was trying out the example given here: https://huggingface.co/docs/transformers/main/en/model_doc/llava_next#multi-image-inference
But I am getting an error while trying to apply chat template. Below are the code and the error:
Code:
import torch
from transformers import LlavaNextProcessor, LlavaNextForConditionalGeneration, AutoProcessor, AutoTokenizer
from PIL import Image
import requests
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
processor = LlavaNextProcessor.from_pretrained("llava-hf/llava-v1.6-mistral-7b-hf")
model = LlavaNextForConditionalGeneration.from_pretrained(
"llava-hf/llava-v1.6-mistral-7b-hf",
torch_dtype=torch.float16,
low_cpu_mem_usage=True
)
model.to(device)
url = "https://www.ilankelman.org/stopsigns/australia.jpg"
image_stop = Image.open(requests.get(url, stream=True).raw)
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image_cats = Image.open(requests.get(url, stream=True).raw)
url = "https://huggingface.co/microsoft/kosmos-2-patch14-224/resolve/main/snowman.jpg"
image_snowman = Image.open(requests.get(url, stream=True).raw)
# Prepare a batch of two prompts, where the first one is a multi-turn conversation and the second is not
conversation_1 = [
{
"role": "user",
"content": [
{"type": "image"},
{"type": "text", "text": "What is shown in this image?"},
],
},
{
"role": "assistant",
"content": [
{"type": "text", "text": "There is a red stop sign in the image."},
],
},
{
"role": "user",
"content": [
{"type": "image"},
{"type": "text", "text": "What about this image? How many cats do you see?"},
],
},
]
conversation_2 = [
{
"role": "user",
"content": [
{"type": "image"},
{"type": "text", "text": "What is shown in this image?"},
],
},
]
prompt_1 = processor.apply_chat_template(conversation_1, add_generation_prompt=True)
prompt_2 = processor.apply_chat_template(conversation_2, add_generation_prompt=True)
prompts = [prompt_1, prompt_2]
Error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[17], line 46
13 conversation_1 = [
14 {
15 "role": "user",
(...)
33 },
34 ]
36 conversation_2 = [
37 {
38 "role": "user",
(...)
43 },
44 ]
---> 46 prompt_1 = processor.apply_chat_template(conversation_1, add_generation_prompt=True)
47 prompt_2 = processor.apply_chat_template(conversation_2, add_generation_prompt=True)
48 prompts = [prompt_1, prompt_2]
File /opt/conda/lib/python3.10/site-packages/transformers/processing_utils.py:926, in ProcessorMixin.apply_chat_template(self, conversation, chat_template, tokenize, **kwargs)
924 chat_template = self.default_chat_template
925 else:
--> 926 raise ValueError(
927 "No chat template is set for this processor. Please either set the `chat_template` attribute, "
928 "or provide a chat template as an argument. See "
929 "https://huggingface.co/docs/transformers/main/en/chat_templating for more information."
930 )
931 return self.tokenizer.apply_chat_template(
932 conversation, chat_template=chat_template, tokenize=tokenize, **kwargs
933 )
ValueError: No chat template is set for this processor. Please either set the `chat_template` attribute, or provide a chat template as an argument. See https://huggingface.co/docs/transformers/main/en/chat_templating for more information.
@biswadeep49 which version of transformers do you have? You need at least v4.43 for chat templates, that is when we added support for it
Hi, I also met the same issue with ValueError: No chat template is set for this processor. Please either set the
chat_template attribute, or provide a chat template as an argument. See https://huggingface.co/docs/transformers/main/en/chat_templating for more information.
I use transformers 4.45.0. Any suggestion?
Thank you.
@zcchen
I just verified that the templates work in the latest version from main
and the latest patch release. If you're on a jupyter notebook, you might need to restart the kernel. It happends sometimes that the package isn't updated until the kernel restarts
Also, I recommend to use v4.44.2 for now, as the version on main
branch in under refactoring and might give some errors. I am working on it, but the PR is not merged yet
the error "No chat template is set for this processor" still exists when using v4.44.2