Clarification of model outputs

by GeorgeBatch - opened about 1 month ago

about 1 month ago

Dear authors,

Can you please clarify if the order of scores when given a positive and a negative prompt is [negative_prob, positive_prob] or [positive_prob, negative_prob]?

Also, can you please expand on the similarity of an image and caption output['sim']? Does a low value mean that the caption and image are similar? What values should I expect? I would like to understand if the values will be bounded and/or negative or positive.

Many thanks,
George

gshaikov

Paige AI org about 1 month ago

Hi George,

For zero shot it is [negative_prob, positive_prob]. You can see the code here: https://huggingface.co/paige-ai/Prism/blob/main/modeling_prism.py#L309-L310.

For the meaning of the similarity matrix, please refer to CLIP paper and the method of contrastive learning in general. In short, these are dot products between image and language features, scaled by temperature.

Best,
George

GeorgeBatch

about 1 month ago

Hi George,

Thank you for the prompt reply and pointer to the modelling code!

So, for two images and one caption, does it mean that the image with the higher similarity to the caption (bigger temperature-scaled dot product) aligns better with the caption?

Best,
George

gshaikov

Paige AI org about 1 month ago

Yes, and you can softmax these scores to get probabilities. This is what contrastive objective does - softmax and then cross-entropy loss on the prob scores (detailes in CLIP paper).

GeorgeBatch

about 1 month ago

I only considered using Softmax on positive vs. negative prompts for a single image. But for multiple images and a single caption, we model which of the images came from the caption. I'll check the CLIP paper, but is my intuition correct?

gshaikov

Paige AI org about 1 month ago

From the CLIP paper you can see that the CE loss is applied over images per label and over labels per image, and then averaged. So label to images works the same way as image to labels.

GeorgeBatch

about 1 month ago

Thanks! It all makes sense now.

GeorgeBatch changed discussion status to closed about 1 month ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment