--- license: mit --- # CLIP Contrastive Language-Image Pretraining (CLIP) model pre-trained on LAION-2B at resolution 224x224. It was introduced in the paper [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) and further reproduced in the follow-up paper [Reproducible scaling laws for contrastive language-image learning](https://arxiv.org/abs/2212.07143). The weights were converted from the `laion/CLIP-ViT-L-14-laion2B-s32B-b82K` presented in the [OpenCLIP LAION-2B collections](https://huggingface.co/collections/laion/openclip-laion-2b-64fcade42d20ced4e9389b30).