Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
merveย 
posted an update Jan 16
Post
Google's SigLIP is another alternative to openai's CLIP, and it just got merged to ๐Ÿค—transformers and it's super easy to use!
To celebrate this, I have created a repository including notebooks and bunch of Spaces on various SigLIP based projects ๐Ÿฅณ
Search for art ๐Ÿ‘‰ merve/draw_to_search_art
Compare SigLIP with CLIP ๐Ÿ‘‰ merve/compare_clip_siglip

How does SigLIP work?
SigLIP an vision-text pre-training technique based on contrastive learning. It jointly trains an image encoder and text encoder such that the dot product of embeddings are most similar for the appropriate text-image pairs
The image below is taken from CLIP, where this contrastive pre-training takes place with softmax, but SigLIP replaces softmax with sigmoid. ๐Ÿ“Ž

Highlights from the paper on why you should use it โœจ
๐Ÿ–ผ๏ธ๐Ÿ“ Authors used medium sized B/16 ViT for image encoder and B-sized transformer for text encoder
๐Ÿ˜ More performant than CLIP on zero-shot
๐Ÿ—ฃ๏ธ Authors trained a multilingual model too!
โšก๏ธ Super efficient, sigmoid is enabling up to 1M items per batch, but the authors chose 32k because the performance saturates after that

It's super easy to use thanks to transformers ๐Ÿ‘‡
from transformers import pipeline
from PIL import Image
import requests

# load pipe
image_classifier = pipeline(task="zero-shot-image-classification", model="google/siglip-base-patch16-256-i18n")

# load image
url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
image = Image.open(requests.get(url, stream=True).raw)

# inference
outputs = image_classifier(image, candidate_labels=["2 cats", "a plane", "a remote"])
outputs = [{"score": round(output["score"], 4), "label": output["label"] } for output in outputs]
print(outputs)

For all the SigLIP notebooks on similarity search and indexing, you can check this [repository](https://github.com/merveenoyan/siglip) out. ๐Ÿค—

very cool! link to model checkpoint on the hub: https://huggingface.co/google/siglip-base-patch16-256-multilingual

@merve

the link to the GitHub repository is broken (it contains ')' at the end)
thanks for sharing your works by the way!