Fredrik commited on
Commit
ff20c09
2 Parent(s): 737c637 67bc61a

Merge remote-tracking branch 'origin/main' into main

Browse files
Files changed (1) hide show
  1. README.md +35 -0
README.md ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <br />
2
+ <p align="center">
3
+ <h1 align="center">M-BERT Distil 40</h1>
4
+
5
+ <p align="center">
6
+ <a href="https://github.com/FreddeFrallan/Multilingual-CLIP/tree/main/Model%20Cards/M-BERT%20Distil%2040">Github Model Card</a>
7
+ </p>
8
+ </p>
9
+
10
+ ## Usage
11
+ To use this model along with the original CLIP vision encoder you need to download the code and additional linear weights from the [Multilingual-CLIP Github](https://github.com/FreddeFrallan/Multilingual-CLIP).
12
+
13
+ Once this is done, you can load and use the model with the following code
14
+ ```python
15
+ from src import multilingual_clip
16
+
17
+ model = multilingual_clip.load_model('M-BERT-Distil-40')
18
+ embeddings = model(['Älgen är skogens konung!', 'Wie leben Eisbären in der Antarktis?', 'Вы знали, что все белые медведи левши?'])
19
+ print(embeddings.shape)
20
+ # Yields: torch.Size([3, 640])
21
+ ```
22
+
23
+ <!-- ABOUT THE PROJECT -->
24
+ ## About
25
+ A [distilbert-base-multilingual](https://huggingface.co/distilbert-base-multilingual-cased) tuned to match the embedding space for [40 languages](https://github.com/FreddeFrallan/Multilingual-CLIP/blob/main/Model%20Cards/M-BERT%20Distil%2040/Fine-Tune-Languages.md), to the embedding space of the CLIP text encoder which accompanies the Res50x4 vision encoder. <br>
26
+ A full list of the 100 languages used during pre-training can be found [here](https://github.com/google-research/bert/blob/master/multilingual.md#list-of-languages), and a list of the 40 languages used during fine-tuning can be found in [SupportedLanguages.md](Fine-Tune-Languages.md).
27
+
28
+ Training data pairs was generated by sampling 40k sentences for each language from the combined descriptions of [GCC](https://ai.google.com/research/ConceptualCaptions/) + [MSCOCO](https://cocodataset.org/#home) + [VizWiz](https://vizwiz.org/tasks-and-datasets/image-captioning/), and translating them into the corresponding language.
29
+ All translation was done using the [AWS translate service](https://aws.amazon.com/translate/), the quality of these translations have currently not been analyzed, but one can assume the quality varies between the 40 languages.
30
+
31
+
32
+ ## Evaluation
33
+ [These results can be viewed at Github](https://github.com/FreddeFrallan/Multilingual-CLIP/tree/main/Model%20Cards/M-BERT%20Distil%2040). <br>
34
+ A non-rigorous qualitative evaluation shows that for the languages French, German, Spanish, Russian, Swedish and Greek it seemingly yields respectable results for most instances. The exception being that Greeks are apparently unable to recognize happy persons. <br>
35
+ When testing on Kannada, a language which was included during pre-training but not fine-tuning, it performed close to random