Model Overview
SigLIP model pre-trained on WebLi at resolution 224x224. It was introduced in the paper Sigmoid Loss for Language Image Pre-Training by Zhai et al. and first released in this repository. SigLIP is CLIP, a multimodal model, with a better loss function. The sigmoid loss operates solely on image-text pairs and does not require a global view of the pairwise similarities for normalization. This allows further scaling up the batch size, while also performing better at smaller batch sizes. A TLDR of SigLIP by one of the authors can be found here.
Weights are released under the Apache 2 License . Keras model code is released under the Apache 2 License.
Links
- SigLIP Quickstart Notebook
- [SigLIP API Documentation](coming soon)
- SigLIP Model Card
- KerasHub Beginner Guide
- KerasHub Model Publishing Guide
Installation
Keras and KerasHub can be installed with:
pip install -U -q keras-hub
pip install -U -q keras
Jax, TensorFlow, and Torch come preinstalled in Kaggle Notebooks. For instructions on installing them in another environment see the Keras Getting Started page.
Presets
The following model checkpoints are provided by the Keras team. Full code examples for each are available below.
Preset name | Parameters | Description |
---|---|---|
Example Usage
import keras
import numpy as np
import matplotlib.pyplot as plt
from keras_hub.models import SigLIPBackbone, SigLIPTokenizer
from keras_hub.layers import SigLIPImageConverter
# instantiate the model and preprocessing tools
siglip = SigLIPBackbone.from_preset("siglip2_large_patch16_512")
tokenizer = SigLIPTokenizer.from_preset("siglip2_large_patch16_512",
sequence_length=64)
image_converter = SigLIPImageConverter.from_preset("siglip2_large_patch16_512")
# obtain tokens for some input text
tokens = tokenizer.tokenize(["mountains", "cat on tortoise", "house"])
# preprocess image and text
image = keras.utils.load_img("cat.jpg")
image = image_converter(np.array([image]).astype(float))
# query the model for similarities
siglip({
"images": image,
"token_ids": tokens,
})
Example Usage with Hugging Face URI
import keras
import numpy as np
import matplotlib.pyplot as plt
from keras_hub.models import SigLIPBackbone, SigLIPTokenizer
from keras_hub.layers import SigLIPImageConverter
# instantiate the model and preprocessing tools
siglip = SigLIPBackbone.from_preset("hf://keras/siglip2_large_patch16_512")
tokenizer = SigLIPTokenizer.from_preset("hf://keras/siglip2_large_patch16_512",
sequence_length=64)
image_converter = SigLIPImageConverter.from_preset("hf://keras/siglip2_large_patch16_512")
# obtain tokens for some input text
tokens = tokenizer.tokenize(["mountains", "cat on tortoise", "house"])
# preprocess image and text
image = keras.utils.load_img("cat.jpg")
image = image_converter(np.array([image]).astype(float))
# query the model for similarities
siglip({
"images": image,
"token_ids": tokens,
})
- Downloads last month
- 4