8bit model always returns empty string

#26

by zwang2022 - opened Feb 19, 2024

zwang2022

Feb 19, 2024

I tried the following code either in my personal computer or kaggle, it always returned empty string.

I tried to replace the image and prompt, in most of the cases, it still return empty string for the output, only in few case, it returned random words.

# pip install accelerate bitsandbytes
import torch
import requests
from PIL import Image
from transformers import Blip2Processor, Blip2ForConditionalGeneration

processor = Blip2Processor.from_pretrained("Salesforce/blip2-opt-2.7b")
model = Blip2ForConditionalGeneration.from_pretrained("Salesforce/blip2-opt-2.7b", load_in_8bit=True, device_map="auto")

img_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg' 
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')

question = "how many dogs are in the picture?"
inputs = processor(raw_image, question, return_tensors="pt").to("cuda", torch.float16)

out = model.generate(**inputs)
print(processor.decode(out[0], skip_special_tokens=True).strip())

ybelkada

Feb 21, 2024

Hi @zwang2022 !
Hmm interesting
Can you try with latest transformers & bitsandbytes? pip install -U transformers bitsandbytes
Also do you face the same issue with 4bit?

yeyimilk

Feb 21, 2024

•

edited Feb 21, 2024

Hi @ybelkada ,
I also tried the full float16 version in a new machine with NVIDIA A40. The same issue happened.
Finally, I found that, if the image or the prompt was not set properly, the blip2 refused to output anything.

I was not lucky enough that the first image with corresponding prompt did not make blip2 to make the output.

I added "Answer: " to the end of prompt, blip2 model was more willing to answer, but still, around 50 of 1000 images, with the same prompt, were not answered.

Also, the ability of text understanding and generation of blip2 is under performance, so I do believe this is not the code example issue while it's the inherited chat ability of blip2.

soumya1729

Apr 18, 2024

Same issue. Is anyone looking into this?

nielsr

Apr 19, 2024

I'm able to reproduce this. cc @ybelkada

soumya1729

Apr 20, 2024

Same with 4bit too. Works perfectly for the same image on full precision though. Noticed this in multiple cases.

processor = Blip2Processor.from_pretrained("Salesforce/blip2-opt-2.7b")
model = Blip2ForConditionalGeneration.from_pretrained("Salesforce/blip2-opt-2.7b", load_in_4bit=True,device_map="auto")

raw_image = Image.open("01256.png").convert('RGB')
inputs = processor(raw_image, return_tensors="pt").to("cuda", torch.float16)
out = model.generate(**inputs)
print(processor.decode(out[0], skip_special_tokens=False).strip())

Returns empty string.

But on full precision, returns perfectly: dog with a caption that says when your debt card declines at the clinic and they have to put the baby back in

nielsr

Apr 22, 2024

Opened an issue for it here: https://github.com/huggingface/transformers/issues/30383

MaulikMadhavi

May 14, 2024

import requests
from PIL import Image
from transformers import Blip2ForConditionalGeneration, Blip2Processor

processor = Blip2Processor.from_pretrained("Salesforce/blip2-opt-6.7b")
model = Blip2ForConditionalGeneration.from_pretrained(
    "Salesforce/blip2-opt-6.7b", device_map="auto"
)

img_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg'
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')

# Adjusting the prompt as per paper:
question = "Question: how many dogs are in picture? Answer:"
inputs = processor(raw_image, question, return_tensors="pt").to("cuda")

out = model.generate(**inputs)
print(processor.decode(out[0], skip_special_tokens=True).strip())

# Normal question
question = "how many dogs are in picture?"
inputs = processor(raw_image, question, return_tensors="pt").to("cuda")

out = model.generate(**inputs)
print(processor.decode(out[0], skip_special_tokens=True).strip())

Gives me "1" and Empty string, respectively. It seems generation accepts different prompt format

mmiller89

Aug 29, 2024

I'm not using 8bit, just fp16, but same thing.

When I add "Answer: " to the end of the prompt, it said nothing.

When I added "Answer: " to the end AND said "Please" at the start of the prompt, it actually gave an answer, but a very short answer ("woman sitting on the beach").

When I said "Please", without "Answer: " at the end, the response was hilarious: "Don't just say its a photo."

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment