Batch inputs (image, prompt)

by jeeyungk - opened

Can we use a batch of image as an input to LLaVA?

Llava Hugging Face org

Hi! Yes Llava-1.5 can take batched inputs, see the code snippet below:

import requests
from PIL import Image

import torch
from transformers import AutoProcessor, LlavaForConditionalGeneration

model = LlavaForConditionalGeneration.from_pretrained("llava-hf/llava-1.5-7b-hf", torch_dtype=torch.float16, device_map="auto")
processor = AutoProcessor.from_pretrained("llava-hf/llava-1.5-7b-hf")

prompts = [
        "USER: <image>\nWhat are the things I should be cautious about when I visit this place? What should I bring with me? ASSISTANT:",
        "USER: <image>\nWhat is this? ASSISTANT:",
image1 ="", stream=True).raw)
image2 ="", stream=True).raw)

inputs = processor(prompts, images=[image1, image2], padding=True, return_tensors="pt")
output = model.generate(**inputs, max_new_tokens=20)

