File size: 532 Bytes
abbfb41
 
 
 
 
1
2
3
4
5

This demo uses [CLIP-Vision-Marian model checkpoint](https://huggingface.co/flax-community/spanish-image-captioninh/) to predict caption for a given image in Spanish. Training was done using image encoder and text decoder with approximately 2.5 million image-text pairs taken from the [Conceptual 12M dataset](https://github.com/google-research-datasets/conceptual-12m) with captions translated using [Marian](https://huggingface.co/transformers/model_doc/marian.html).


For more details, click on `Usage` or `Article` 🤗 below.