ValueError: The input provided to the model are wrong. The number of image tokens is 1 while the number of image given to the model is 1. This prevents correct indexing and breaks batch generation.

by bghira - opened Apr 14, 2024

Discussion

bghira

Apr 14, 2024

error with example code.

bghira

Apr 14, 2024

ValueError: The input provided to the model are wrong. The number of image tokens is 1 while the number of image given to the model is 1. This prevents correct indexing and breaks batch generation.

keunseop

Apr 17, 2024

Check your prompt template

bghira

Apr 17, 2024

i'm using the exact demo code from the model card

keunseop

Apr 17, 2024

I also used the demo code as is and received the above error message when I entered the wrong prompt format.

bghira

Apr 17, 2024

i don't understand what you're trying to say.

i used the model code from the 1.6-34b card, which we are on the community page for.

it has the system prompt built in.

are you in the right place?

keunseop

Apr 17, 2024

Yes, I used the demo code as is and it worked fine, but I modified the prompt incorrectly and the error above occurred.

bghira

Apr 17, 2024

i think your issue was different. i have not modified anything. i simply copy/paste the code and execute it, and I receive the error.

Ego-J

Apr 17, 2024

i think your issue was different. i have not modified anything. i simply copy/paste the code and execute it, and I receive the error.

same problem!

bghira

Apr 17, 2024

and i'm using the Git version of Transformers. no difference between release version or Git main.

bghira

Apr 17, 2024

i almost don't believe that @keunseop got the 34b model even running. are you sure you didn't switch it to Vicuna or something?

Ego-J

Apr 17, 2024

This comment has been hidden

Ego-J

Apr 17, 2024

Seems like '< image >: 64000' is not in the input_ids encoded by the processor with the demo prompt

keunseop

Apr 17, 2024

This comment has been hidden

keunseop

Apr 17, 2024

•

edited Apr 17, 2024

To compare inference speeds,I ran both the mistral 7b model and the 34b model on four v100 GPUs.

bghira

Apr 17, 2024

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
transformers 4.40.0.dev0 requires tokenizers<0.19,>=0.14, but you have tokenizers 0.19.0 which is incompatible.

even trying to install the latest Tokenizers library (i was running 15.2) doesn't work with the latest Transformers main branch.

what a wild thing to observe, considering both projects are from the same team and rely on each other so heavily

bghira

Apr 17, 2024

INFO:root:Processing image: anime-summerghost-54.png, data: <PIL.PngImagePlugin.PngImageFile image mode=RGB size=1920x1080 at 0x16FC842E0>
INFO:root:Using LLaVA 1.6+ model.
INFO:root:Inputs: {'input_ids': tensor([[59603,  9334,  1397,   562, 13310,  2756,   597,   663, 15874, 10357,
         14135,    98,   707, 14135,  3641,  6901,    97,  7283,    97,   597,
         31081,  8476,   592,   567,  2756, 59610, 59575,  3275,    98,  2134,
          1471, 59601, 59568, 64000,   144,  5697,   620,  2709,   594,   719,
          2728,   100, 39965,  8898,  9129, 59601]], device='mps:0'), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]],
       device='mps:0'), 'pixel_values': tensor([[[[[ 1.3464,  1.3464,  1.3464,  ...,  0.0325,  0.1201,  0.1493],

mine has 64000 in there but it still doesn't work, even though i switched the processor config to use_fast=False

bghira

Apr 17, 2024

https://github.com/huggingface/transformers/issues/29835

bghira

Apr 17, 2024

@keunseop so again i wonder how you got this working when it has never worked

nielsr

Llava Hugging Face org Apr 17, 2024

Will investigate, thanks for reporting

keunseop

Apr 18, 2024

from transformers import LlavaNextProcessor, LlavaNextForConditionalGeneration, BitsAndBytesConfig
import torch
from PIL import Image
import requests


quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
)

processor = LlavaNextProcessor.from_pretrained("llava-hf/llava-v1.6-34b-hf")

model = LlavaNextForConditionalGeneration.from_pretrained("llava-hf/llava-v1.6-34b-hf", quantization_config=quantization_config, device_map="auto") 
#model.to("cuda:0")

# prepare image and text prompt, using the appropriate prompt template
url = "https://github.com/haotian-liu/LLaVA/blob/1a91fc274d7c35a9b50b3cb29c4247ae5837ce39/images/llava_v1_5_radar.jpg?raw=true"
image = Image.open(requests.get(url, stream=True).raw)
prompt = "<|im_start|>system\nAnswer the questions.<|im_end|><|im_start|>user\n<image>\nWhat is shown in this image?<|im_end|><|im_start|>assistant\n"

inputs = processor(prompt, image, return_tensors="pt").to("cuda:0")

# autoregressively complete prompt
output = model.generate(**inputs, max_new_tokens=100)

print(processor.decode(output[0], skip_special_tokens=True))

@ptx0

Added quantization code for inference on multiple GPUs.

zhizhou57

Apr 18, 2024

i meet the same problem

SsssOvO

Jun 28, 2024

Me too.

RaushanTurganbay

Llava Hugging Face org Jun 28, 2024

Should be solved in the latest version of transformers. Can you confirm you still observe the bug after updating?

Also, I believe there was a similar problem when to using "mps" device, see https://github.com/huggingface/transformers/issues/30294 for details

xfdu1

Jul 1, 2024

same problem, upgrade transformers to 4.42.3 does not solve the issue

jugg1024

Jul 2, 2024

•

edited Jul 2, 2024

i found that the token_index of <image> in the added_tokens.json is 64003, but default image_token_index is 64000.
so i add one line to the demo code, then i worked.

inputs['input_ids'][inputs['input_ids'] == 64003] = 64000

nielsr

Llava Hugging Face org Jul 2, 2024

Hi yes this is being discussed here: https://github.com/huggingface/transformers/issues/31713

nielsr

Llava Hugging Face org Jul 6, 2024

Rolled back the commits to make sure it works. The updates were related to adding the chat template, which @RaushanTurganbay will take care off when she's back

Narrator5000

Aug 9, 2024

•

edited Aug 10, 2024

I'm using "llava-hf/llava-v1.6-mistral-7b-hf" and just got rid of this error on my code. Double-check that you're always using LlavaNext where possible. I was using LlavaForConditionalGeneration instead of LlavaNextForConditionalGeneration.

azrai99

Oct 18, 2024

@bghira You need to use the correct chat template I guess. It works for me.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment