laion/CLIP-ViT-H-14-laion2B-s32B-b79K · Different result between model space and local deployment

Feb 8, 2024

When I use the same image and query for model inference in Hugging Face Spaces and on my local machine, the results are totally different.

May I know why the results are so different when using the same model?

rwightman

LAION eV org Feb 9, 2024

@jeff-lee are you using the transformers or open_clip version?

jeff-lee

Feb 10, 2024

•

edited Feb 10, 2024

@rwightman
Thank you for your reply

Yes, I'm using transformers version, the following is my source code for testing.

target_model = CLIPModel.from_pretrained('laion/CLIP-ViT-H-14-laion2B-s32B-b79K')
target_processor = CLIPProcessor.from_pretrained('laion/CLIP-ViT-H-14-laion2B-s32B-b79K')

inputs = target_processor(text=labels, images=images, return_tensors='pt', padding=True).to('cpu')
outputs = target_model(**inputs)
logits_per_image = outputs.logits_per_image
probs = logits_per_image.softmax(dim=1).tolist()
for prob in probs:
predictions = dict(zip(labels, prob))
rounded_predictions = { label: round(score, 4) for label, score in predictions.items() }
result.append(rounded_predictions)

jeff-lee

Feb 12, 2024

•

edited Feb 12, 2024

@jeff-lee are you using the transformers or open_clip version?

@rwightman Any different between transformers or open_clip version?

rwightman

LAION eV org Feb 12, 2024

@jeff-lee I'm more familiar with the open_clip variants as I am involved in that project. I don't see why there'd be a significant difference in either case though. Is in running on CPU in both environments or GPU in one and CPU in other?