Spaces:
Running
Running
# Italian CLIP | |
.... | |
# Novel Contributions | |
The original CLIP model was trained on 400millions text-image pairs; this amount of data is not available for Italian and the only datasets for captioning in the literature are MSCOCO-IT (translated version of MSCOCO) and WIT. To get competitive results we follewed three directions: 1) more data 2) better augmentation and 3) better training. | |
## More Data | |
## Better Augmentations | |
## Better Training | |
different optimizer and backbone freezing | |
# Scientific Validity | |
To better understand how well our clip-italian model works we run an experimental evaluation. Since this is the first clip-based model in Italian, we used the multilingual CLIP model as a comparison baseline. | |
We selected two different tasks: | |
+ image-retrieval | |
+ zero-shot classification | |
## Image Retrieval | |
## Zero-shot classification | |
# Broader Outlook |