Silvia Terragni commited on
Commit
acfaaf8
1 Parent(s): 5fa6a85

Update README.md

Browse files
Files changed (1) hide show
  1. readme.md +17 -9
readme.md CHANGED
@@ -1,12 +1,12 @@
1
  # Italian CLIP
2
 
3
- With a few tricks, we have been able to fine-tune a competitive Italian CLIP model with only 1.4 million training samples.
4
 
5
- In building this project we kept in mind the following things:
6
 
7
- + **Novel Contributions**: We created a dataset of ~1.4 million Italian image-text pairs and to our knowledge trained the best Italian CLIP model currently in existence;
8
- + **Scientific Validity**: Claim are easy, facts are hard. That's why validation is important to assess the real impact of a model. We thoroughly evaluated our models and made the validation reproducible for everybody.
9
- + **Broader Outlook**: we always considered which are the possible usages for this model.
10
 
11
  We put our **hearts** and **souls** into the project during this week! Not only did we work on a cool project, but we were
12
  able to make new friends and and learn a lot from each other to work towards a common goal!
@@ -14,7 +14,12 @@ Thank you for this amazing opportunity, we hope you will like the results. :hear
14
 
15
  # Novel Contributions
16
 
17
- The original CLIP model was trained on 400 million image-text pairs; this amount of data is not available for Italian and the only datasets for captioning in the literature are MSCOCO-IT (a translated version of MSCOCO) and WIT. To get competitive results we followed three strategies: 1) more data, 2) better augmentations and 3) better training.
 
 
 
 
 
18
 
19
  ## More Data
20
 
@@ -24,9 +29,12 @@ Thus, we tried to add as much data as possible while keeping the data-quality as
24
  We considered three main sources of data:
25
 
26
  + WIT. Most of these captions describe ontological knowledge and encyclopedic facts (e.g., Roberto Baggio in 1994).
27
- However, this kind of text, without more information, is not useful to learn a good mapping between images and captions. On the other hand,
28
- this text is written in Italian and it is good quality. To prevent polluting the data with captions that are not meaningful, we used POS tagging
29
- on the data and removed all the captions that were composed for the 80% or more by PROPN.
 
 
 
30
 
31
  + MSCOCO-IT.
32
 
 
1
  # Italian CLIP
2
 
3
+ With a few tricks, we have been able to fine-tune a competitive Italian CLIP model with **only 1.4 million** training samples.
4
 
5
+ In building this project we kept in mind the following principles:
6
 
7
+ + **Novel Contributions**: We created a dataset of ~1.4 million Italian image-text pairs and, to the best of our knowledge, we trained the best Italian CLIP model currently in existence;
8
+ + **Scientific Validity**: Claim are easy, facts are hard. That's why validation is important to assess the real impact of a model. We thoroughly evaluated our models in several tasks and made the validation reproducible for everybody.
9
+ + **Broader Outlook**: We always kept in mind which are the possible usages for this model.
10
 
11
  We put our **hearts** and **souls** into the project during this week! Not only did we work on a cool project, but we were
12
  able to make new friends and and learn a lot from each other to work towards a common goal!
 
14
 
15
  # Novel Contributions
16
 
17
+ The original CLIP model was trained on 400 million image-text pairs; this amount of data is not available for Italian.
18
+ We indeed worked in a **low-resource setting**. The only datasets for captioning in the literature are MSCOCO-IT (a translated version of MSCOCO) and WIT.
19
+ To get competitive results we followed three strategies:
20
+ 1. more data;
21
+ 2. better augmentations;
22
+ 3. better training.
23
 
24
  ## More Data
25
 
 
29
  We considered three main sources of data:
30
 
31
  + WIT. Most of these captions describe ontological knowledge and encyclopedic facts (e.g., Roberto Baggio in 1994).
32
+ However, this kind of text, without more information, is not useful to learn a good mapping between images and captions.
33
+ On the other hand, this text is written in Italian and it is good quality.
34
+ To prevent polluting the data with captions that are not meaningful, we used POS tagging
35
+ on the data and removed all the captions that were composed for the 80% or more by PROPN.
36
+
37
+ Example: ....
38
 
39
  + MSCOCO-IT.
40