CLIP

Contrastive Language-Image Pretraining (CLIP) model pre-trained on LAION-2B at resolution 224x224. It was introduced in the paper Learning Transferable Visual Models From Natural Language Supervision and further reproduced in the follow-up paper Reproducible scaling laws for contrastive language-image learning. The weights were converted from the laion/CLIP-ViT-H-14-laion2B-s32B-b79K presented in the OpenCLIP LAION-2B collections.

Downloads last month: 72

Safetensors

Model size

986M params

Tensor type

I64

F32

Inference Examples

Zero-Shot Image Classification

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including cs-giung/clip-vit-huge-patch14-laion2b

OpenCLIP (LAION-2B)

Collection

6 items • Updated Jul 7