Spaces:
Running
Running
updating the readme.md
Browse files
readme.md
CHANGED
@@ -15,13 +15,13 @@ Thank you for this amazing opportunity, we hope you will like the results. :hear
|
|
15 |
# Novel Contributions
|
16 |
|
17 |
The original CLIP model was trained on 400 million image-text pairs; this amount of data is not available for Italian.
|
18 |
-
We indeed worked in a **low-resource setting**. The only datasets for captioning in the literature are MSCOCO-IT (a translated version of MSCOCO) and WIT.
|
19 |
To get competitive results we followed three strategies:
|
20 |
-
1. more data;
|
21 |
2. better augmentations;
|
22 |
3. better training.
|
23 |
|
24 |
-
## More Data
|
25 |
|
26 |
We eventually had to deal with the fact that we do not have the same data that OpenAI had during the training of CLIP.
|
27 |
Thus, we tried to add as much data as possible while keeping the data-quality as high as possible.
|
@@ -29,11 +29,13 @@ Thus, we tried to add as much data as possible while keeping the data-quality as
|
|
29 |
We considered three main sources of data:
|
30 |
|
31 |
+ [WIT](https://github.com/google-research-datasets/wit) is an image-caption dataset collected from Wikipedia (see,
|
32 |
-
[Srinivasan et al., 2021](https://arxiv.org/pdf/2103.01913.pdf)).
|
|
|
33 |
However, this kind of text, without more information, is not useful to learn a good mapping between images and captions.
|
34 |
-
On the other hand, this text is written in Italian and it is good quality.
|
35 |
-
To prevent polluting the data with captions that are not meaningful, we used POS tagging
|
36 |
-
on the
|
|
|
37 |
|
38 |
Example: ....
|
39 |
|
@@ -124,9 +126,12 @@ the translated image labels might have had an impact on the final scores.
|
|
124 |
|
125 |
## Qualitative Evaluation
|
126 |
|
|
|
|
|
|
|
127 |
### Colors
|
128 |
|
129 |
-
###
|
130 |
|
131 |
# Broader Outlook
|
132 |
|
|
|
15 |
# Novel Contributions
|
16 |
|
17 |
The original CLIP model was trained on 400 million image-text pairs; this amount of data is not available for Italian.
|
18 |
+
We indeed worked in a **low-resource setting**. The only datasets for Italian captioning in the literature are MSCOCO-IT (a translated version of MSCOCO) and WIT.
|
19 |
To get competitive results we followed three strategies:
|
20 |
+
1. more and better data;
|
21 |
2. better augmentations;
|
22 |
3. better training.
|
23 |
|
24 |
+
## More and Better Data
|
25 |
|
26 |
We eventually had to deal with the fact that we do not have the same data that OpenAI had during the training of CLIP.
|
27 |
Thus, we tried to add as much data as possible while keeping the data-quality as high as possible.
|
|
|
29 |
We considered three main sources of data:
|
30 |
|
31 |
+ [WIT](https://github.com/google-research-datasets/wit) is an image-caption dataset collected from Wikipedia (see,
|
32 |
+
[Srinivasan et al., 2021](https://arxiv.org/pdf/2103.01913.pdf)). We focused on the *Reference Description* captions described in the paper as they are
|
33 |
+
the ones of highest quality. Nonetheless, many of these captions describe ontological knowledge and encyclopedic facts (e.g., Roberto Baggio in 1994).
|
34 |
However, this kind of text, without more information, is not useful to learn a good mapping between images and captions.
|
35 |
+
On the other hand, this text is written in Italian and it is of good quality.
|
36 |
+
To prevent polluting the data with captions that are not meaningful, we used *POS tagging*
|
37 |
+
on the text and removed all the captions that were composed for the 80% or more by PROPN. This is a simple solution that allowed us to retain much
|
38 |
+
of the dataset, without introducing noise.
|
39 |
|
40 |
Example: ....
|
41 |
|
|
|
126 |
|
127 |
## Qualitative Evaluation
|
128 |
|
129 |
+
We hereby show some very interesting properties of the model. The first one is its ability to detect colors and the second one is its (partial) counting
|
130 |
+
ability. To our own surprise, many of the answers the model gives make a lot of sense!
|
131 |
+
|
132 |
### Colors
|
133 |
|
134 |
+
### Counting
|
135 |
|
136 |
# Broader Outlook
|
137 |
|