vinid commited on
Commit
1a58153
1 Parent(s): d7cf7cc

Update introduction.md

Browse files
Files changed (1) hide show
  1. introduction.md +2 -2
introduction.md CHANGED
@@ -1,6 +1,6 @@
1
  # Italian CLIP
2
 
3
- CLIP ([Radford et al., 2021](https://arxiv.org/abs/2103.00020)) is an multimodel model that can learn to represent images and text jointly in the same space.
4
 
5
  In this project, we aim to propose the first CLIP model trained on Italian data, that in this context can be considered a
6
  low resource language. Using a few techniques, we have been able to fine-tune a SOTA Italian CLIP model with **only 1.4 million** training samples. Our Italian CLIP model
@@ -33,7 +33,7 @@ is going to compute the similarity between the image and each label. The webapp
33
  <img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/image_to_text.png" alt="drawing" width="95%"/>
34
 
35
  + **Localization**: This is a **very cool** feature :sunglasses: and at the best of our knowledge, it is a novel contribution. We can use CLIP
36
- to find where "something" (like a "cat") is an image. The location of the object is computed by masking different areas of the image and looking at how the similarity to the image description changes.
37
 
38
  <img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/gatto_cane.png" alt="drawing" width="95%"/>
39
 
 
1
  # Italian CLIP
2
 
3
+ CLIP ([Radford et al., 2021](https://arxiv.org/abs/2103.00020)) is a multimodel model that can learn to represent images and text jointly in the same space.
4
 
5
  In this project, we aim to propose the first CLIP model trained on Italian data, that in this context can be considered a
6
  low resource language. Using a few techniques, we have been able to fine-tune a SOTA Italian CLIP model with **only 1.4 million** training samples. Our Italian CLIP model
 
33
  <img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/image_to_text.png" alt="drawing" width="95%"/>
34
 
35
  + **Localization**: This is a **very cool** feature :sunglasses: and at the best of our knowledge, it is a novel contribution. We can use CLIP
36
+ to find where "something" (like a "cat") is in an image. The location of the object is computed by masking different areas of the image and looking at how the similarity to the image description changes.
37
 
38
  <img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/gatto_cane.png" alt="drawing" width="95%"/>
39