nreimers commited on
Commit
fc29aef
1 Parent(s): 3f99938

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +57 -9
README.md CHANGED
@@ -10,7 +10,7 @@ license: apache-2.0
10
 
11
  # sentence-transformers/clip-ViT-B-32-multilingual-v1
12
 
13
- This is a multi-lingual version of the OpenAI CLIP-ViT-B32 model. You can map text (in 50+ languages) and images to a common dense vector space such that images and the matching texts are close.
14
 
15
 
16
  ## Usage (Sentence-Transformers)
@@ -24,21 +24,69 @@ pip install -U sentence-transformers
24
  Then you can use the model like this:
25
 
26
  ```python
27
- from sentence_transformers import SentenceTransformer
28
- sentences = ["This is an example sentence", "Each sentence is converted"]
 
 
29
 
30
- model = SentenceTransformer('sentence-transformers/clip-ViT-B-32-multilingual-v1')
31
- embeddings = model.encode(sentences)
32
- print(embeddings)
33
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
 
 
 
 
35
 
 
36
 
37
- ## Evaluation Results
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38
 
 
 
39
 
 
40
 
41
- For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: [https://seb.sbert.net](https://seb.sbert.net?model_name=sentence-transformers/clip-ViT-B-32-multilingual-v1)
42
 
43
 
44
 
 
10
 
11
  # sentence-transformers/clip-ViT-B-32-multilingual-v1
12
 
13
+ This is a multi-lingual version of the OpenAI CLIP-ViT-B32 model. You can map text (in 50+ languages) and images to a common dense vector space such that images and the matching texts are close. This model can be used for **image search** (users search through a large collection of images) and for **multi-lingual zero-shot image classification** (image labels are defined as text).
14
 
15
 
16
  ## Usage (Sentence-Transformers)
 
24
  Then you can use the model like this:
25
 
26
  ```python
27
+ from sentence_transformers import SentenceTransformer, util
28
+ from PIL import Image, ImageFile
29
+ import requests
30
+ import torch
31
 
32
+ # We use the original clip-ViT-B-32 for encoding images
33
+ img_model = SentenceTransformer('clip-ViT-B-32')
34
+
35
+ # Our text embedding model is aligned to the img_model and maps 50+
36
+ # languages to the same vector space
37
+ text_model = SentenceTransformer('sentence-transformers/clip-ViT-B-32-multilingual-v1')
38
+
39
+
40
+ # Now we load and encode the images
41
+ def load_image(url_or_path):
42
+ if url_or_path.startswith("http://") or url_or_path.startswith("https://"):
43
+ return Image.open(requests.get(url_or_path, stream=True).raw)
44
+ else:
45
+ return Image.open(url_or_path)
46
+
47
+ # We load 3 images. You can either pass URLs or
48
+ # a path on your disc
49
+ img_paths = [
50
+ # Dog image
51
+ "https://unsplash.com/photos/QtxgNsmJQSs/download?ixid=MnwxMjA3fDB8MXxhbGx8fHx8fHx8fHwxNjM1ODQ0MjY3&w=640",
52
+
53
+ # Cat image
54
+ "https://unsplash.com/photos/9UUoGaaHtNE/download?ixid=MnwxMjA3fDB8MXxzZWFyY2h8Mnx8Y2F0fHwwfHx8fDE2MzU4NDI1ODQ&w=640",
55
 
56
+ # Beach image
57
+ "https://unsplash.com/photos/Siuwr3uCir0/download?ixid=MnwxMjA3fDB8MXxzZWFyY2h8NHx8YmVhY2h8fDB8fHx8MTYzNTg0MjYzMg&w=640"
58
+ ]
59
 
60
+ images = [load_image(img) for img in img_paths]
61
 
62
+ # Map images to the vector space
63
+ img_embeddings = img_model.encode(images)
64
+
65
+ # Now we encode our text:
66
+ texts = [
67
+ "A dog in the snow",
68
+ "Eine Katze", # German: A cat
69
+ "Una playa con palmeras." # Spanish: a beach with palm trees
70
+ ]
71
+
72
+ text_embeddings = text_model.encode(texts)
73
+
74
+ # Compute cosine similarities:
75
+ cos_sim = util.cos_sim(text_embeddings, img_embeddings)
76
+
77
+ for text, scores in zip(texts, cos_sim):
78
+ max_img_idx = torch.argmax(scores)
79
+ print("Text:", text)
80
+ print("Score:", scores[max_img_idx] )
81
+ print("Path:", img_paths[max_img_idx], "\n")
82
+
83
+ ```
84
 
85
+ ## Multilingual Image Search - Demo
86
+ For a demo of multilingual image search, have a look at: [Image_Search-multilingual.ipynb](https://github.com/UKPLab/sentence-transformers/tree/master/examples/applications/image-search/Image_Search-multilingual.ipynb) ( [Colab version](https://colab.research.google.com/drive/1N6woBKL4dzYsHboDNqtv-8gjZglKOZcn?usp=sharing) )
87
 
88
+ For more details on image search and zero-shot image classification, have a look at the documentation on [SBERT.net](https://www.sbert.net/examples/applications/image-search/README.html).
89
 
 
90
 
91
 
92