Model Overview

SigLIP model pre-trained on WebLi at resolution 224x224. It was introduced in the paper Sigmoid Loss for Language Image Pre-Training by Zhai et al. and first released in this repository. SigLIP is CLIP, a multimodal model, with a better loss function. The sigmoid loss operates solely on image-text pairs and does not require a global view of the pairwise similarities for normalization. This allows further scaling up the batch size, while also performing better at smaller batch sizes. A TLDR of SigLIP by one of the authors can be found here.

Weights are released under the Apache 2 License . Keras model code is released under the Apache 2 License.

Links

Installation

Keras and KerasHub can be installed with:

pip install -U -q keras-hub
pip install -U -q keras

Jax, TensorFlow, and Torch come preinstalled in Kaggle Notebooks. For instructions on installing them in another environment see the Keras Getting Started page.

Presets

The following model checkpoints are provided by the Keras team. Full code examples for each are available below.

Preset name Parameters Description

Example Usage

import keras
import numpy as np
import matplotlib.pyplot as plt
from keras_hub.models import SigLIPBackbone, SigLIPTokenizer
from keras_hub.layers import SigLIPImageConverter

# instantiate the model and preprocessing tools
siglip = SigLIPBackbone.from_preset("siglip2_large_patch16_512")
tokenizer = SigLIPTokenizer.from_preset("siglip2_large_patch16_512",
sequence_length=64)
image_converter = SigLIPImageConverter.from_preset("siglip2_large_patch16_512")

# obtain tokens for some input text
tokens = tokenizer.tokenize(["mountains", "cat on tortoise", "house"])

# preprocess image and text
image = keras.utils.load_img("cat.jpg")
image = image_converter(np.array([image]).astype(float))

# query the model for similarities
siglip({
     "images": image,
     "token_ids": tokens,
})

Example Usage with Hugging Face URI

import keras
import numpy as np
import matplotlib.pyplot as plt
from keras_hub.models import SigLIPBackbone, SigLIPTokenizer
from keras_hub.layers import SigLIPImageConverter

# instantiate the model and preprocessing tools
siglip = SigLIPBackbone.from_preset("hf://keras/siglip2_large_patch16_512")
tokenizer = SigLIPTokenizer.from_preset("hf://keras/siglip2_large_patch16_512",
sequence_length=64)
image_converter = SigLIPImageConverter.from_preset("hf://keras/siglip2_large_patch16_512")

# obtain tokens for some input text
tokens = tokenizer.tokenize(["mountains", "cat on tortoise", "house"])

# preprocess image and text
image = keras.utils.load_img("cat.jpg")
image = image_converter(np.array([image]).astype(float))

# query the model for similarities
siglip({
     "images": image,
     "token_ids": tokens,
})
Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including keras/siglip2_large_patch16_512