Batch inputs (image, prompt)

#10
by jeeyungk - opened

Can we use a batch of image as an input to LLaVA?

Llava Hugging Face org

Hi! Yes Llava-1.5 can take batched inputs, see the code snippet below:

import requests
from PIL import Image

import torch
from transformers import AutoProcessor, LlavaForConditionalGeneration

model = LlavaForConditionalGeneration.from_pretrained("llava-hf/llava-1.5-7b-hf", torch_dtype=torch.float16, device_map="auto")
processor = AutoProcessor.from_pretrained("llava-hf/llava-1.5-7b-hf")

prompts = [
        "USER: <image>\nWhat are the things I should be cautious about when I visit this place? What should I bring with me? ASSISTANT:",
        "USER: <image>\nWhat is this? ASSISTANT:",
  ]
 
image1 = Image.open(requests.get("https://llava-vl.github.io/static/images/view.jpg", stream=True).raw)
image2 = Image.open(requests.get("http://images.cocodataset.org/val2017/000000039769.jpg", stream=True).raw)

inputs = processor(prompts, images=[image1, image2], padding=True, return_tensors="pt")
output = model.generate(**inputs, max_new_tokens=20)
print(output)

Sign up or log in to comment