Very flat output? "Probabilities" all close to zero.

by Moghrua - opened Nov 22, 2023

Nov 22, 2023

Using the sample code, the results look a bit strange - the "probabilities" come out almost perfectly zero. The scoring function looks like a good match for the original - could there be an issue with the tokenizer somehow?

rwightman

PyTorch Image Models org Nov 22, 2023

@Moghrua this behaves very differently than softmax where the output is forced to sum to 1. In many cases you can end up with a lot of low scores if none of the texts are a great matches. I've definitely been able to get scores of .5 all the way to .97. Sometimes .1-.2 is a pretty good match.

If you cut and paste the provided beignet example it will output:
Label probabilities: [('a Dog.', 0.0), ('a cat', 0.0), ('a donut', 0.0), ('A Beignet.', 0.517)]

rwightman

PyTorch Image Models org Nov 22, 2023

If you suspect any tokenizer issues, can double check by comparing w/ https://colab.research.google.com/github/google-research/big_vision/blob/main/big_vision/configs/proj/image_text/SigLIP_demo.ipynb ... I have done some testing and seemed to compare well but could be texts that don't tokenize the same...

talrejanikhil

Jan 23, 2024

I also have a similar concern. I used this image

and tags ["a dog", "a cat", "a bird", "a fish"]
and the probabilities were Label probabilities: [('a dog', 1e-06), ('a cat', 4.4e-05), ('a bird', 0.0), ('a fish', 5e-06)]
Is this model really expected to have some low probabilities?

rwightman

PyTorch Image Models org Jan 23, 2024

@talrejanikhil I've observed it can be exceedingly fussy / specific? as to what's going to yield a high prob ... eg, twiddle yours a bit

[('a dog', 0.0), ('a cat on a catfood box', 0.024), ('a catfood box', 0.351), ('a beignet', 0.0)]

So yeah, I think this is usual behaviour, it also seems a bit sensitive to preprocessing / weight translation, esp when unsure, the prob swings on the output from the reference jax version can be a bit higher than I'd expect. So you could try similar prompts in their notebook...

Example if you do use softmax, obv softmax will push up the probs so the sum is 1.0
Label probabilities: [('a dog', 0.02), ('a cat', 0.98), ('a beignet', 0.0)]

talrejanikhil

Jan 24, 2024

Yes that's true. I actually do miss the high probs that CLIP model outputs

merve

PyTorch Image Models org Jan 24, 2024

@Moghrua @talrejanikhil I have observed the same behavior so I had to normalize the outputs here: https://huggingface.co/spaces/merve/multilingual-zero-shot-image-clf I guess since the zero shot accuracy is still better than other models (as claimed by paper) it's just you need to stretch the outputs to actually see that?

talrejanikhil

Jan 24, 2024

@merve do you have code to show how you normalized the outputs?

merve

PyTorch Image Models org Jan 24, 2024

@talrejanikhil it's pretty much making it add up to one proportionally, nothing fancy, here: https://huggingface.co/spaces/merve/multilingual-zero-shot-image-clf/blob/2958a16dc88a49f703e872fb79af237d544c5a18/app.py#L65

rwightman

PyTorch Image Models org Jan 25, 2024

•

edited Jan 25, 2024

@merve @talrejanikhil FYI, down to some numerical differences sigmoid + normalizing like this is essentially softmax

It looks/feels nicer in that everything adding up to 1. must be a probability, but it's pretty obvious there's little to no calibration there. In either case, the sigmoid output is probably more closely calibrated wrt to what was seen in the training distribution...

giffmana

Mar 19, 2024

Hi guys, I haven't read all of this, but the model being generally "more conservative" is totally expected. As Ross says, the model is not calibrated, because it's a "raw" model. What calibration makes most sense depends on your data/task. I guess we should explain this more somewhere at some point.

The good news is that calibrating it is very easy. If you have a dataset representative of your task, you can simply adjust the bias value (a single scalar!) by hand or grid-search, so that the probabilities look like you prefer them. I've done this many times on many tasks, and it works flawlessly. Actually, our official SigLIP colab even contains an interactive demo that shows this:

talrejanikhil

Mar 19, 2024

My question would still be how could we do this in hugging face? Is there a way to set the bias parameter

talrejanikhil

Mar 19, 2024

Actually I figured this out myself. You can do something like this:

model_name = 'google/siglip-so400m-patch14-384'
model = AutoModel.from_pretrained(model_name)
# Set your bias value here:
model.logit_bias = nn.Parameter(torch.tensor([-10.0]))
processor = AutoProcessor.from_pretrained(model_name)

This significantly increased the probability values for the example I posted above

rwightman

PyTorch Image Models org Mar 19, 2024

FWIW the same applies to the OpenCLIP variant of the model, once created model.logit_bias = nn.Parameter(torch.tensor([-10.0])) will be equivalent

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment