vinid commited on
Commit
2175e1c
1 Parent(s): 608a0a7

new stuff in readme.md

Browse files
Files changed (1) hide show
  1. readme.md +16 -2
readme.md CHANGED
@@ -2,6 +2,14 @@
2
 
3
  With a few tricks, we have been able to fine-tune a competitive CLIP-italian model with only 1 million training samples.
4
 
 
 
 
 
 
 
 
 
5
  # Novel Contributions
6
 
7
  The original CLIP model was trained on 400millions text-image pairs; this amount of data is not available for Italian and the only datasets for captioning in the literature are MSCOCO-IT (translated version of MSCOCO) and WIT. To get competitive results we follewed three directions: 1) more data 2) better augmentation and 3) better training.
@@ -18,17 +26,23 @@ However, this kind of text, without more information, is not useful to learn a g
18
  this text is written in Italian and it is good quality. To prevent polluting the data with captions that are not meaningful, we used POS tagging
19
  on the data and removed all the captions that were composed for the 80% or more by PROPN.
20
 
21
- + MSCOCO-IT
22
 
23
- + CC
24
 
25
 
26
  ## Better Augmentations
27
 
28
  ## Better Training
29
 
 
 
 
 
30
  ### Optimizer
31
 
 
 
32
 
33
  ### Backbone Freezing
34
 
 
2
 
3
  With a few tricks, we have been able to fine-tune a competitive CLIP-italian model with only 1 million training samples.
4
 
5
+ In building this project we kept in mind the following things:
6
+
7
+ + **Novel Contributions**: we tried to bring something new to the table
8
+ + **Scientific Validity**: models can look very cool, but external validation is important to assess the real impact
9
+ + **Broader Outlook**: we always considered which are the possible usages for this model
10
+
11
+ We put our hearts and souls in this project during this week and we hope you will like it :heart:
12
+
13
  # Novel Contributions
14
 
15
  The original CLIP model was trained on 400millions text-image pairs; this amount of data is not available for Italian and the only datasets for captioning in the literature are MSCOCO-IT (translated version of MSCOCO) and WIT. To get competitive results we follewed three directions: 1) more data 2) better augmentation and 3) better training.
 
26
  this text is written in Italian and it is good quality. To prevent polluting the data with captions that are not meaningful, we used POS tagging
27
  on the data and removed all the captions that were composed for the 80% or more by PROPN.
28
 
29
+ + MSCOCO-IT.
30
 
31
+ + Conceptual Captions.
32
 
33
 
34
  ## Better Augmentations
35
 
36
  ## Better Training
37
 
38
+ After different trials, we realized that the usual way of training this model was
39
+ not good enough to get good results. We thus modified two different parts of the
40
+ training pipeline: the optimizer and the training with frozen components.
41
+
42
  ### Optimizer
43
 
44
+ The standard AdamW didn't seem enough to train the model...
45
+
46
 
47
  ### Backbone Freezing
48