Confidence scores for image captioning?

#13

by acmidev - opened Aug 17, 2023

Discussion

acmidev

Aug 17, 2023

Hi there,

I was wondering how to generate confidence scores when generating image captions with the sample code.

Best, Simon.

from PIL import Image
import requests
from transformers import Blip2Processor, Blip2ForConditionalGeneration
import torch

device = "cuda" if torch.cuda.is_available() else "cpu"

processor = Blip2Processor.from_pretrained("Salesforce/blip2-opt-2.7b")
model = Blip2ForConditionalGeneration.from_pretrained(
    "Salesforce/blip2-opt-2.7b", torch_dtype=torch.float16
)
model.to(device)
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

inputs = processor(images=image, return_tensors="pt").to(device, torch.float16)

generated_ids = model.generate(**inputs)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0].strip()
print(generated_text)
two cats laying on a couch

nielsr

Aug 17, 2023

Hi,

You can obtain a confidence score by passing output_scores=True and return_dict_in_generate=Trueto the generate() method.

outputs = model.generate(**inputs, output_scores=True, return_dict=_in_generate=True)
scores = outputs.scores

According to the docs:

In case of greedy decoding; this contains the processed prediction scores of the language modeling head (scores for each vocabulary token before SoftMax) at each generation step. Tuple of torch.FloatTensor with up to max_new_tokens elements (one element for each generated token), with each tensor of shape (batch_size, config.vocab_size).

To calculate a probability for the entire sequence, you could do the following:

# get probability for each generated token
topks = [s.softmax(-1).topk(1) for s in output.scores] 

probs = []
for tk in topks:
    probs.append(tk.values.view(-1)[0].item())

# multiply probabilities
sequence_prob = torch.tensor(probs).prod()

acmidev

Aug 23, 2023

@nielsr Thanks so much for the code - so adding it back to the sample code via this Google Colab the output is:

Prediction: two cats laying on a couch
Confidence: 0.012353635393083096

So does that suggest the confidence level is 1.2%?

shams123321

Mar 15

@nielsr Thanks so much for the code - so adding it back to the sample code via this Google Colab the output is:
Prediction: two cats laying on a couch
Confidence: 0.012353635393083096
So does that suggest the confidence level is 1.2%?

Have you solved this problem? I obtained the same result using the code provided above. If you have solved this problem, I hope you can share your correct code with me. Thank you very much! Good luck to you!

acmidev

Mar 18

@shams123321 I haven't heard back from @nielsr about it yet, and we haven't resolved it unfortunately.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment