metadata

language: it
license: null
datasets:
  - wit
  - ctl/conceptualCaptions
  - mscoco-it
tags:
  - italian
  - bert
  - vit
  - vision

CLIP-Italian

CLIP Italian is a CLIP-like Model for Italian. The CLIP model (Contrastive Language–Image Pre-training) was developed by researchers at OpenAI and is able to efficiently learn visual concepts from natural language supervision.

We fine-tuned a competitive Italian CLIP model with only ~1.4 million Italian image-text pairs. This model is part of the Flax/Jax Community Week, organized by HuggingFace and TPU usage sponsored by Google.

Training Data

We considered three main sources of data:

WIT
MSCOCO-IT
Conceptual Captions

Training Procedure

Preprocessing, hardware used, hyperparameters...

Evaluation Performance

Limitations

Usage

Team members

Federico Bianchi (vinid)
Raphael Pisoni (4rtemi5)
Giuseppe Attanasio (g8a9)
Silvia Terragni (silviatti)
Dario Balestri (D3Reo)
Gabriele Sarti (gsarti)
Sri Lakshmi (srisweet)

Useful links

CLIP Blog post
CLIP paper
Community Week README
Community Week channel
Hybrid CLIP example scripts
Model Repository