4rtemi5 commited on
Commit
4a0f49b
1 Parent(s): e747f27

Update readme.md

Browse files
Files changed (1) hide show
  1. readme.md +9 -10
readme.md CHANGED
@@ -1,25 +1,25 @@
1
  # Italian CLIP
2
 
3
- With a few tricks, we have been able to fine-tune a competitive CLIP-italian model with only 1 million training samples.
4
 
5
  In building this project we kept in mind the following things:
6
 
7
- + **Novel Contributions**: we tried to bring something new to the table;
8
- + **Scientific Validity**: models can look very cool, but external validation is important to assess the real impact;
9
  + **Broader Outlook**: we always considered which are the possible usages for this model.
10
 
11
- We put our **hearts** and **souls** in this project during this week! Not only we worked on a cool project, but we were
12
- able to meet new people and make new friends that worked together for a common goal!
13
- Thank you for this amazing opportunity, we hope you will like our project :heart:.
14
 
15
  # Novel Contributions
16
 
17
- The original CLIP model was trained on 400millions text-image pairs; this amount of data is not available for Italian and the only datasets for captioning in the literature are MSCOCO-IT (translated version of MSCOCO) and WIT. To get competitive results we follewed three directions: 1) more data 2) better augmentation and 3) better training.
18
 
19
  ## More Data
20
 
21
  We eventually had to deal with the fact that we do not have the same data that OpenAI had during the training of CLIP.
22
- Thus, we opted for one choice, data of medium-high quality.
23
 
24
  We considered three main sources of data:
25
 
@@ -67,7 +67,6 @@ We selected two different tasks:
67
  + image-retrieval
68
  + zero-shot classification
69
 
70
-
71
  ### Image Retrieval
72
 
73
  | MRR | CLIP-Italian | mCLIP |
@@ -79,7 +78,7 @@ We selected two different tasks:
79
 
80
  ### Zero-shot classification
81
 
82
- | Accuracy | CLIP-Italian | mCLIP |
83
  | --------------- | ------------ |-------|
84
  | Accuracy@1 | | |
85
  | Accuracy@5 | | |
 
1
  # Italian CLIP
2
 
3
+ With a few tricks, we have been able to fine-tune a competitive Italian CLIP model with only 1.4 million training samples.
4
 
5
  In building this project we kept in mind the following things:
6
 
7
+ + **Novel Contributions**: We created a dataset of ~1.4 million Italian image-text pairs and to our knowledge trained the best Italian CLIP model currently in existence;
8
+ + **Scientific Validity**: Claim are easy, facts are hard. That's why validation is important to assess the real impact of a model. That's why we thoroughly evaluated our models and made the validation reproducible for everybody.
9
  + **Broader Outlook**: we always considered which are the possible usages for this model.
10
 
11
+ We put our **hearts** and **souls** into the project during this week! Not only did we work on a cool project, but we were
12
+ able to make new friends and and learn a lot from each other to work towards a common goal!
13
+ Thank you for this amazing opportunity, we hope you will like the results. :heart:
14
 
15
  # Novel Contributions
16
 
17
+ The original CLIP model was trained on 400 million image-text pairs; this amount of data is not available for Italian and the only datasets for captioning in the literature are MSCOCO-IT (a translated version of MSCOCO) and WIT. To get competitive results we followed three strategies: 1) more data, 2) better augmentations and 3) better training.
18
 
19
  ## More Data
20
 
21
  We eventually had to deal with the fact that we do not have the same data that OpenAI had during the training of CLIP.
22
+ Thus, we tried to add as much data as possible while keeping the data-quality as high as possible.
23
 
24
  We considered three main sources of data:
25
 
 
67
  + image-retrieval
68
  + zero-shot classification
69
 
 
70
  ### Image Retrieval
71
 
72
  | MRR | CLIP-Italian | mCLIP |
 
78
 
79
  ### Zero-shot classification
80
 
81
+ | Accuracy | CLIP-Italian | mCLIP |
82
  | --------------- | ------------ |-------|
83
  | Accuracy@1 | | |
84
  | Accuracy@5 | | |