FreddeFrallan commited on
Commit
b8ff95e
1 Parent(s): 8b8b311

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -22,7 +22,7 @@ print(embeddings.shape)
22
 
23
  <!-- ABOUT THE PROJECT -->
24
  ## About
25
- A [distilbert-base-multilingual](https://huggingface.co/distilbert-base-multilingual-cased) tuned to match the embedding space for [69 languages](https://github.com/FreddeFrallan/Multilingual-CLIP/blob/main/Model%20Cards/M-BERT%20Base%2069/Fine-Tune-Languages.md), to the embedding space of the CLIP text encoder which accompanies the Res50x4 vision encoder. <br>
26
  A full list of the 100 languages used during pre-training can be found [here](https://github.com/google-research/bert/blob/master/multilingual.md#list-of-languages), and a list of the 4069languages used during fine-tuning can be found in [SupportedLanguages.md](https://github.com/FreddeFrallan/Multilingual-CLIP/blob/main/Model%20Cards/M-BERT%20Base%2069/Fine-Tune-Languages.md).
27
 
28
  Training data pairs was generated by sampling 40k sentences for each language from the combined descriptions of [GCC](https://ai.google.com/research/ConceptualCaptions/) + [MSCOCO](https://cocodataset.org/#home) + [VizWiz](https://vizwiz.org/tasks-and-datasets/image-captioning/), and translating them into the corresponding language.
22
 
23
  <!-- ABOUT THE PROJECT -->
24
  ## About
25
+ A [BERT-base-multilingual](https://huggingface.co/bert-base-multilingual-cased) tuned to match the embedding space for [69 languages](https://github.com/FreddeFrallan/Multilingual-CLIP/blob/main/Model%20Cards/M-BERT%20Base%2069/Fine-Tune-Languages.md), to the embedding space of the CLIP text encoder which accompanies the Res50x4 vision encoder. <br>
26
  A full list of the 100 languages used during pre-training can be found [here](https://github.com/google-research/bert/blob/master/multilingual.md#list-of-languages), and a list of the 4069languages used during fine-tuning can be found in [SupportedLanguages.md](https://github.com/FreddeFrallan/Multilingual-CLIP/blob/main/Model%20Cards/M-BERT%20Base%2069/Fine-Tune-Languages.md).
27
 
28
  Training data pairs was generated by sampling 40k sentences for each language from the combined descriptions of [GCC](https://ai.google.com/research/ConceptualCaptions/) + [MSCOCO](https://cocodataset.org/#home) + [VizWiz](https://vizwiz.org/tasks-and-datasets/image-captioning/), and translating them into the corresponding language.