Zero-Shot Image Classification
OpenCLIP
clip
Edit model card

Model Summary

NLLB-CLIP is a model that combines a text encoder from the NLLB model and an image encoder from the standard CLIP. This allows us to extend the model capabilities to 201 languages of the Flores-200. NLLB-CLIP sets state-of-the-art on the Crossmodal-3600 dataset by performing very well on low-resource languages. You can find more details about the model in the paper.

Acknowledgements

I thank ML Collective for providing Google Cloud compute resources to train the OpenCLIP-compatible version of NLLB-CLIP.

Downloads last month
93
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train visheratin/nllb-clip-base-oc