apple/TiC-CLIP-bestpool-cumulative · How to Use this Model for Zero-Shot Image Classification?

eclipticwonder

Jun 14, 2024

Hi,

How to use this model for Zero-Shot Image Classification? Can you provide a sample code?

fartashf

Apple org Jun 17, 2024

Hi,
Thanks for your interest. Here is an example for loading and evaluating the model:

import open_clip
from huggingface_hub import hf_hub_download
filename = hf_hub_download(repo_id="apple/TiC-CLIP-bestpool-cumulative", filename="checkpoints/2016.pt")
model, _, preprocess = open_clip.create_model_and_transforms('ViT-B-16', filename)

tokenizer = open_clip.get_tokenizer('ViT-B-16')

image = preprocess(Image.open("image.png").convert('RGB')).unsqueeze(0)
text = tokenizer(["a diagram", "a dog", "a cat"])

with torch.no_grad(), torch.cuda.amp.autocast():
    image_features = model.encode_image(image)
    text_features = model.encode_text(text)
    image_features /= image_features.norm(dim=-1, keepdim=True)
    text_features /= text_features.norm(dim=-1, keepdim=True)

    text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)

print("Label probs:", text_probs)

fartashf changed discussion status to closed Jun 17, 2024

fartashf

Apple org Jun 17, 2024

Please note that these models are released to facilitate research on continual learning. Please refer to the model card for more code examples.