vinid commited on
Commit
e5ec521
1 Parent(s): 5fcf75e

new stuff in readme.md

Browse files
Files changed (1) hide show
  1. readme.md +48 -2
readme.md CHANGED
@@ -7,21 +7,67 @@ The original CLIP model was trained on 400millions text-image pairs; this amount
7
 
8
  ## More Data
9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  ## Better Augmentations
11
 
12
  ## Better Training
13
 
14
- different optimizer and backbone freezing
 
 
 
 
 
 
 
15
 
16
  # Scientific Validity
 
17
  To better understand how well our clip-italian model works we run an experimental evaluation. Since this is the first clip-based model in Italian, we used the multilingual CLIP model as a comparison baseline.
18
 
 
 
 
 
19
  We selected two different tasks:
20
  + image-retrieval
21
  + zero-shot classification
 
22
 
23
  ## Image Retrieval
24
 
 
 
 
 
 
 
 
25
  ## Zero-shot classification
26
 
27
- # Broader Outlook
 
 
 
 
 
 
 
 
 
 
 
 
 
7
 
8
  ## More Data
9
 
10
+ We eventually had to deal with the fact that we do not have the same data that OpenAI had during the training of CLIP.
11
+ Thus, we opted for one choice, data of medium-high quality.
12
+
13
+ We considered three main sources of data:
14
+
15
+
16
+ + WIT. Most of this caption describe ontological knowledge and encyclopedic facts (e.g., Roberto Baggio in 1994).
17
+ However, this kind of text, without more information, is not useful to learn a good mapping between images and captions. On the other hand,
18
+ this text is written in Italian and it is good quality. To prevent polluting the data with captions that are not meaningful, we used POS tagging
19
+ on the data and removed all the captions that were composed for the 80% or more by PROPN.
20
+ + MSCOCO-IT
21
+ + CC
22
+
23
+
24
  ## Better Augmentations
25
 
26
  ## Better Training
27
 
28
+ ### Optimizer
29
+
30
+
31
+ ### Backbone Freezing
32
+
33
+ ![Backbone Freezing](static/img/clip-italian.png)
34
+
35
+
36
 
37
  # Scientific Validity
38
+ Those images are definitely cool and interesting, but a model is nothing without validation.
39
  To better understand how well our clip-italian model works we run an experimental evaluation. Since this is the first clip-based model in Italian, we used the multilingual CLIP model as a comparison baseline.
40
 
41
+ ## mCLIP
42
+
43
+ ## Tasks
44
+
45
  We selected two different tasks:
46
  + image-retrieval
47
  + zero-shot classification
48
+
49
 
50
  ## Image Retrieval
51
 
52
+ | MRR | CLIP-Italian | mCLIP |
53
+ | --------------- | ------------ |-------|
54
+ | MRR@1 | | |
55
+ | MRR@5 | | |
56
+ | MRR@10 | | |
57
+
58
+
59
  ## Zero-shot classification
60
 
61
+ | Accuracy | CLIP-Italian | mCLIP |
62
+ | --------------- | ------------ |-------|
63
+ | Accuracy@1 | | |
64
+ | Accuracy@5 | | |
65
+ | Accuracy@10 | | |
66
+ | Accuracy@100 | 81.08 | 67.11 |
67
+
68
+ # Broader Outlook
69
+
70
+
71
+
72
+
73
+ This readme has been designed using resources from Flaticon.com