EIStakovskii commited on
Commit
3f32ade
1 Parent(s): 6151406

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +25 -1
README.md CHANGED
@@ -4,4 +4,28 @@ tags:
4
  - feature-extraction
5
  - embeddings
6
  - sentence-similarity
7
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  - feature-extraction
5
  - embeddings
6
  - sentence-similarity
7
+ ---
8
+ # LaBSE for French and German
9
+ This is a shortened version of [sentence-transformers/LaBSE](https://huggingface.co/sentence-transformers/LaBSE). The model was prepaired with the direct help of [cointegrated](https://huggingface.co/cointegrated), the author of the [LaBSE-en-ru model](https://huggingface.co/cointegrated/LaBSE-en-ru).
10
+
11
+ The current model includes only French and German tokens, and the vocabulary is thus 10% of the original while number of parameters in the whole model is 27% of the original.
12
+
13
+ To get the sentence embeddings, you can use the following code:
14
+ ```python
15
+ import torch
16
+ from transformers import AutoTokenizer, AutoModel
17
+ tokenizer = AutoTokenizer.from_pretrained("EIStakovskii/LaBSE-fr-de")
18
+ model = AutoModel.from_pretrained("EIStakovskii/LaBSE-fr-de")
19
+ sentences = ["Wie geht es dir?", "Comment vas-tu?"]
20
+ encoded_input = tokenizer(sentences, padding=True, truncation=True, max_length=64, return_tensors='pt')
21
+ with torch.no_grad():
22
+ model_output = model(**encoded_input)
23
+ embeddings = model_output.pooler_output
24
+ embeddings = torch.nn.functional.normalize(embeddings)
25
+ print(embeddings)
26
+ ```
27
+
28
+ ## Reference:
29
+ Fangxiaoyu Feng, Yinfei Yang, Daniel Cer, Narveen Ari, Wei Wang. [Language-agnostic BERT Sentence Embedding](https://arxiv.org/abs/2007.01852). July 2020
30
+
31
+ License: [https://tfhub.dev/google/LaBSE/1](https://tfhub.dev/google/LaBSE/1)