vinid commited on
Commit
9ea982d
1 Parent(s): 3140e4f

updating the readme.md

Browse files
Files changed (1) hide show
  1. readme.md +13 -8
readme.md CHANGED
@@ -15,13 +15,13 @@ Thank you for this amazing opportunity, we hope you will like the results. :hear
15
  # Novel Contributions
16
 
17
  The original CLIP model was trained on 400 million image-text pairs; this amount of data is not available for Italian.
18
- We indeed worked in a **low-resource setting**. The only datasets for captioning in the literature are MSCOCO-IT (a translated version of MSCOCO) and WIT.
19
  To get competitive results we followed three strategies:
20
- 1. more data;
21
  2. better augmentations;
22
  3. better training.
23
 
24
- ## More Data
25
 
26
  We eventually had to deal with the fact that we do not have the same data that OpenAI had during the training of CLIP.
27
  Thus, we tried to add as much data as possible while keeping the data-quality as high as possible.
@@ -29,11 +29,13 @@ Thus, we tried to add as much data as possible while keeping the data-quality as
29
  We considered three main sources of data:
30
 
31
  + [WIT](https://github.com/google-research-datasets/wit) is an image-caption dataset collected from Wikipedia (see,
32
- [Srinivasan et al., 2021](https://arxiv.org/pdf/2103.01913.pdf)). Most of these captions describe ontological knowledge and encyclopedic facts (e.g., Roberto Baggio in 1994).
 
33
  However, this kind of text, without more information, is not useful to learn a good mapping between images and captions.
34
- On the other hand, this text is written in Italian and it is good quality.
35
- To prevent polluting the data with captions that are not meaningful, we used POS tagging
36
- on the data and removed all the captions that were composed for the 80% or more by PROPN.
 
37
 
38
  Example: ....
39
 
@@ -124,9 +126,12 @@ the translated image labels might have had an impact on the final scores.
124
 
125
  ## Qualitative Evaluation
126
 
 
 
 
127
  ### Colors
128
 
129
- ### Numbers
130
 
131
  # Broader Outlook
132
 
 
15
  # Novel Contributions
16
 
17
  The original CLIP model was trained on 400 million image-text pairs; this amount of data is not available for Italian.
18
+ We indeed worked in a **low-resource setting**. The only datasets for Italian captioning in the literature are MSCOCO-IT (a translated version of MSCOCO) and WIT.
19
  To get competitive results we followed three strategies:
20
+ 1. more and better data;
21
  2. better augmentations;
22
  3. better training.
23
 
24
+ ## More and Better Data
25
 
26
  We eventually had to deal with the fact that we do not have the same data that OpenAI had during the training of CLIP.
27
  Thus, we tried to add as much data as possible while keeping the data-quality as high as possible.
 
29
  We considered three main sources of data:
30
 
31
  + [WIT](https://github.com/google-research-datasets/wit) is an image-caption dataset collected from Wikipedia (see,
32
+ [Srinivasan et al., 2021](https://arxiv.org/pdf/2103.01913.pdf)). We focused on the *Reference Description* captions described in the paper as they are
33
+ the ones of highest quality. Nonetheless, many of these captions describe ontological knowledge and encyclopedic facts (e.g., Roberto Baggio in 1994).
34
  However, this kind of text, without more information, is not useful to learn a good mapping between images and captions.
35
+ On the other hand, this text is written in Italian and it is of good quality.
36
+ To prevent polluting the data with captions that are not meaningful, we used *POS tagging*
37
+ on the text and removed all the captions that were composed for the 80% or more by PROPN. This is a simple solution that allowed us to retain much
38
+ of the dataset, without introducing noise.
39
 
40
  Example: ....
41
 
 
126
 
127
  ## Qualitative Evaluation
128
 
129
+ We hereby show some very interesting properties of the model. The first one is its ability to detect colors and the second one is its (partial) counting
130
+ ability. To our own surprise, many of the answers the model gives make a lot of sense!
131
+
132
  ### Colors
133
 
134
+ ### Counting
135
 
136
  # Broader Outlook
137