vinid commited on
Commit
fc8611a
1 Parent(s): a5b18fc

updating the readme

Browse files
Files changed (1) hide show
  1. introduction.md +14 -2
introduction.md CHANGED
@@ -70,7 +70,7 @@ Our implementation is available online [here](https://github.com/clip-italian/cl
70
  ### Backbone Freezing
71
 
72
  The ViT used by OpenAI was already trained on 400million images and it is the element in our architecture that probably required less training.
73
- The same is true for the BERT model we use. To allow the randomly initialized Re-projection Layers to warm up without messing with the tuned weights of the backbones we decided to do a first training with the backbones of our architecture completely frozen. Only after these layers converged did we unfreeze the rest of the model to fine-tune all the components. This technique allowed us to reach a much better validation loss.
74
 
75
  <img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/clip-italian.png" alt="drawing" width="50%"/>
76
 
@@ -139,21 +139,33 @@ then there is its (partial) counting ability and finally the ability of understa
139
  Look at the following - slightly cherry picked (but not even that much) - examples:
140
 
141
  ### Colors
 
142
  <img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/fiore_giallo.png" alt="drawing" width="600"/>
 
 
143
  <img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/fiore_blu.png" alt="drawing" width="600"/>
144
 
145
  ### Counting
 
146
  <img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/gatto.png" alt="drawing" width="600"/>
 
 
147
  <img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/due_gatti.png" alt="drawing" width="600"/>
148
 
149
  ### Complex Queries
 
150
  <img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/due_cavalli_marroni.png" alt="drawing" width="600"/>
 
151
  <img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/gatto_su_sedia.png" alt="drawing" width="600"/>
152
 
153
 
154
  # Broader Outlook
155
 
156
- We believe that this model can be useful for many different applications, not only in research settings. Italy has many different collections
 
 
 
 
157
  of photos in digital format. For example, the [Istituto Luce Cinecittà](https://it.wikipedia.org/wiki/Istituto_Luce_Cinecitt%C3%A0) is an Italian governative entity that collects photos of Italy since the
158
  early 1900 and it is part of the largest movie studios in Europe (Cinecittà).
159
 
 
70
  ### Backbone Freezing
71
 
72
  The ViT used by OpenAI was already trained on 400million images and it is the element in our architecture that probably required less training.
73
+ The same is true for the BERT model we use. To allow the randomly initialized Re-projection Layers to warm up without messing with the tuned weights of the backbones we decided to do a first training with the backbones of our architecture completely frozen. Only after these layers converged we unfreezed the rest of the model to fine-tune all the components. This technique allowed us to reach a much better validation loss.
74
 
75
  <img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/clip-italian.png" alt="drawing" width="50%"/>
76
 
 
139
  Look at the following - slightly cherry picked (but not even that much) - examples:
140
 
141
  ### Colors
142
+ Here's a blu flower
143
  <img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/fiore_giallo.png" alt="drawing" width="600"/>
144
+
145
+ And here's a yellow flower
146
  <img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/fiore_blu.png" alt="drawing" width="600"/>
147
 
148
  ### Counting
149
+ What about "one cat"
150
  <img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/gatto.png" alt="drawing" width="600"/>
151
+
152
+ And what about "two cats"?
153
  <img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/due_gatti.png" alt="drawing" width="600"/>
154
 
155
  ### Complex Queries
156
+ Have you ever seen "two brown horses"?
157
  <img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/due_cavalli_marroni.png" alt="drawing" width="600"/>
158
+ And finally, here's a very nice "cat on a chair"
159
  <img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/gatto_su_sedia.png" alt="drawing" width="600"/>
160
 
161
 
162
  # Broader Outlook
163
 
164
+ We believe that this model can be useful for many different applications. From image classification
165
+ to clustering, a model like CLIP Italian can be used to support researchers and practitioners in many different tasks.
166
+ Indeed, not only it can be useful in research, but also in industry. A very interesting use-case is given by ecommerce platforms:
167
+ these website often deal with a main source of text that is the query engine and with lots of images of the products. CLIP Italian
168
+ can be a killer app in this context, providing a way to search for images and text. Nonetheless, Italy has many different collections
169
  of photos in digital format. For example, the [Istituto Luce Cinecittà](https://it.wikipedia.org/wiki/Istituto_Luce_Cinecitt%C3%A0) is an Italian governative entity that collects photos of Italy since the
170
  early 1900 and it is part of the largest movie studios in Europe (Cinecittà).
171