sections/intro.md · flax-community/multilingual-image-captioning at e4b27b87aa95f224c51bfd4577156d78272a5177

This demo uses CLIP-mBART50 model checkpoint to predict caption for a given image in 4 languages (English, French, German, Spanish). Training was done using image encoder and text decoder with approximately 5 million image-text pairs taken from the Conceptual 12M dataset translated using MBart50.

New demo coming soon 🤗

For more details, click on Usage or Article 🤗 below.