gchhablani's picture
Fix image display issue.
547e7ab
|
raw
history blame
630 Bytes
## Abstract
This project is focused on Spanish Image Captioning. Most of the existing datasets and models on this task work with English-only image-text pairs. Our intention here is to show that CLIP Vision + Marian model can be trained on Spanish translation textual checkpoints with pre-trained image encoders and made to perform well enough on this particular task.
Due to lack of good-quality Spanish data, we translate subsets of the Conceptual 12M dataset into Spanish using the Marian MT `Helsinki-NLP/opus-mt-en-es` model. With better translated captions, and hyperparameter-tuning, we expect to see higher performance.