Spaces:
Running
Running
new stuff in readme.md
Browse files
readme.md
CHANGED
@@ -2,6 +2,14 @@
|
|
2 |
|
3 |
With a few tricks, we have been able to fine-tune a competitive CLIP-italian model with only 1 million training samples.
|
4 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5 |
# Novel Contributions
|
6 |
|
7 |
The original CLIP model was trained on 400millions text-image pairs; this amount of data is not available for Italian and the only datasets for captioning in the literature are MSCOCO-IT (translated version of MSCOCO) and WIT. To get competitive results we follewed three directions: 1) more data 2) better augmentation and 3) better training.
|
@@ -18,17 +26,23 @@ However, this kind of text, without more information, is not useful to learn a g
|
|
18 |
this text is written in Italian and it is good quality. To prevent polluting the data with captions that are not meaningful, we used POS tagging
|
19 |
on the data and removed all the captions that were composed for the 80% or more by PROPN.
|
20 |
|
21 |
-
+ MSCOCO-IT
|
22 |
|
23 |
-
+
|
24 |
|
25 |
|
26 |
## Better Augmentations
|
27 |
|
28 |
## Better Training
|
29 |
|
|
|
|
|
|
|
|
|
30 |
### Optimizer
|
31 |
|
|
|
|
|
32 |
|
33 |
### Backbone Freezing
|
34 |
|
|
|
2 |
|
3 |
With a few tricks, we have been able to fine-tune a competitive CLIP-italian model with only 1 million training samples.
|
4 |
|
5 |
+
In building this project we kept in mind the following things:
|
6 |
+
|
7 |
+
+ **Novel Contributions**: we tried to bring something new to the table
|
8 |
+
+ **Scientific Validity**: models can look very cool, but external validation is important to assess the real impact
|
9 |
+
+ **Broader Outlook**: we always considered which are the possible usages for this model
|
10 |
+
|
11 |
+
We put our hearts and souls in this project during this week and we hope you will like it :heart:
|
12 |
+
|
13 |
# Novel Contributions
|
14 |
|
15 |
The original CLIP model was trained on 400millions text-image pairs; this amount of data is not available for Italian and the only datasets for captioning in the literature are MSCOCO-IT (translated version of MSCOCO) and WIT. To get competitive results we follewed three directions: 1) more data 2) better augmentation and 3) better training.
|
|
|
26 |
this text is written in Italian and it is good quality. To prevent polluting the data with captions that are not meaningful, we used POS tagging
|
27 |
on the data and removed all the captions that were composed for the 80% or more by PROPN.
|
28 |
|
29 |
+
+ MSCOCO-IT.
|
30 |
|
31 |
+
+ Conceptual Captions.
|
32 |
|
33 |
|
34 |
## Better Augmentations
|
35 |
|
36 |
## Better Training
|
37 |
|
38 |
+
After different trials, we realized that the usual way of training this model was
|
39 |
+
not good enough to get good results. We thus modified two different parts of the
|
40 |
+
training pipeline: the optimizer and the training with frozen components.
|
41 |
+
|
42 |
### Optimizer
|
43 |
|
44 |
+
The standard AdamW didn't seem enough to train the model...
|
45 |
+
|
46 |
|
47 |
### Backbone Freezing
|
48 |
|