Update README.md
Browse files
README.md
CHANGED
@@ -10,7 +10,7 @@ license: apache-2.0
|
|
10 |
|
11 |
# sentence-transformers/clip-ViT-B-32-multilingual-v1
|
12 |
|
13 |
-
This is a multi-lingual version of the OpenAI CLIP-ViT-B32 model. You can map text (in 50+ languages) and images to a common dense vector space such that images and the matching texts are close.
|
14 |
|
15 |
|
16 |
## Usage (Sentence-Transformers)
|
@@ -24,21 +24,69 @@ pip install -U sentence-transformers
|
|
24 |
Then you can use the model like this:
|
25 |
|
26 |
```python
|
27 |
-
from sentence_transformers import SentenceTransformer
|
28 |
-
|
|
|
|
|
29 |
|
30 |
-
|
31 |
-
|
32 |
-
|
33 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
34 |
|
|
|
|
|
|
|
35 |
|
|
|
36 |
|
37 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
38 |
|
|
|
|
|
39 |
|
|
|
40 |
|
41 |
-
For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: [https://seb.sbert.net](https://seb.sbert.net?model_name=sentence-transformers/clip-ViT-B-32-multilingual-v1)
|
42 |
|
43 |
|
44 |
|
|
|
10 |
|
11 |
# sentence-transformers/clip-ViT-B-32-multilingual-v1
|
12 |
|
13 |
+
This is a multi-lingual version of the OpenAI CLIP-ViT-B32 model. You can map text (in 50+ languages) and images to a common dense vector space such that images and the matching texts are close. This model can be used for **image search** (users search through a large collection of images) and for **multi-lingual zero-shot image classification** (image labels are defined as text).
|
14 |
|
15 |
|
16 |
## Usage (Sentence-Transformers)
|
|
|
24 |
Then you can use the model like this:
|
25 |
|
26 |
```python
|
27 |
+
from sentence_transformers import SentenceTransformer, util
|
28 |
+
from PIL import Image, ImageFile
|
29 |
+
import requests
|
30 |
+
import torch
|
31 |
|
32 |
+
# We use the original clip-ViT-B-32 for encoding images
|
33 |
+
img_model = SentenceTransformer('clip-ViT-B-32')
|
34 |
+
|
35 |
+
# Our text embedding model is aligned to the img_model and maps 50+
|
36 |
+
# languages to the same vector space
|
37 |
+
text_model = SentenceTransformer('sentence-transformers/clip-ViT-B-32-multilingual-v1')
|
38 |
+
|
39 |
+
|
40 |
+
# Now we load and encode the images
|
41 |
+
def load_image(url_or_path):
|
42 |
+
if url_or_path.startswith("http://") or url_or_path.startswith("https://"):
|
43 |
+
return Image.open(requests.get(url_or_path, stream=True).raw)
|
44 |
+
else:
|
45 |
+
return Image.open(url_or_path)
|
46 |
+
|
47 |
+
# We load 3 images. You can either pass URLs or
|
48 |
+
# a path on your disc
|
49 |
+
img_paths = [
|
50 |
+
# Dog image
|
51 |
+
"https://unsplash.com/photos/QtxgNsmJQSs/download?ixid=MnwxMjA3fDB8MXxhbGx8fHx8fHx8fHwxNjM1ODQ0MjY3&w=640",
|
52 |
+
|
53 |
+
# Cat image
|
54 |
+
"https://unsplash.com/photos/9UUoGaaHtNE/download?ixid=MnwxMjA3fDB8MXxzZWFyY2h8Mnx8Y2F0fHwwfHx8fDE2MzU4NDI1ODQ&w=640",
|
55 |
|
56 |
+
# Beach image
|
57 |
+
"https://unsplash.com/photos/Siuwr3uCir0/download?ixid=MnwxMjA3fDB8MXxzZWFyY2h8NHx8YmVhY2h8fDB8fHx8MTYzNTg0MjYzMg&w=640"
|
58 |
+
]
|
59 |
|
60 |
+
images = [load_image(img) for img in img_paths]
|
61 |
|
62 |
+
# Map images to the vector space
|
63 |
+
img_embeddings = img_model.encode(images)
|
64 |
+
|
65 |
+
# Now we encode our text:
|
66 |
+
texts = [
|
67 |
+
"A dog in the snow",
|
68 |
+
"Eine Katze", # German: A cat
|
69 |
+
"Una playa con palmeras." # Spanish: a beach with palm trees
|
70 |
+
]
|
71 |
+
|
72 |
+
text_embeddings = text_model.encode(texts)
|
73 |
+
|
74 |
+
# Compute cosine similarities:
|
75 |
+
cos_sim = util.cos_sim(text_embeddings, img_embeddings)
|
76 |
+
|
77 |
+
for text, scores in zip(texts, cos_sim):
|
78 |
+
max_img_idx = torch.argmax(scores)
|
79 |
+
print("Text:", text)
|
80 |
+
print("Score:", scores[max_img_idx] )
|
81 |
+
print("Path:", img_paths[max_img_idx], "\n")
|
82 |
+
|
83 |
+
```
|
84 |
|
85 |
+
## Multilingual Image Search - Demo
|
86 |
+
For a demo of multilingual image search, have a look at: [Image_Search-multilingual.ipynb](https://github.com/UKPLab/sentence-transformers/tree/master/examples/applications/image-search/Image_Search-multilingual.ipynb) ( [Colab version](https://colab.research.google.com/drive/1N6woBKL4dzYsHboDNqtv-8gjZglKOZcn?usp=sharing) )
|
87 |
|
88 |
+
For more details on image search and zero-shot image classification, have a look at the documentation on [SBERT.net](https://www.sbert.net/examples/applications/image-search/README.html).
|
89 |
|
|
|
90 |
|
91 |
|
92 |
|