slone
/

LaBSE-en-ru-myv-v1

Feature Extraction

sentence-similarity

text-embeddings-inference

Inference Endpoints

Model card Files Files and versions Community

cointegrated commited on Sep 20, 2022

Commit

967fbbf

•

1 Parent(s): d34f966

Update README.md

Files changed (1) hide show

README.md +7 -4

README.md CHANGED Viewed

@@ -26,13 +26,16 @@ It is based on [sentence-transformers/LaBSE](https://huggingface.co/sentence-tra
   - Masked language modelling on `myv` monolingual data;
   - Sentence pair classification to distinguish correct `ru-myv` translations from random pairs.
  ```python
 import torch
 from transformers import AutoTokenizer, AutoModel
-tokenizer = AutoTokenizer.from_pretrained("cointegrated/LaBSE-en-ru")
-model = AutoModel.from_pretrained("cointegrated/LaBSE-en-ru")
 sentences = ["Hello World", "Привет Мир", "Шумбратадо Мастор"]
-encoded_input = tokenizer(sentences, padding=True, truncation=True, max_length=64, return_tensors='pt')
 with torch.no_grad():
     model_output = model(**encoded_input)
 embeddings = model_output.pooler_output
@@ -40,4 +43,4 @@ embeddings = torch.nn.functional.normalize(embeddings)
 print(embeddings.shape)  # torch.Size([3, 768])
 ```
-The model can be used as a sentence encoder or fine-tuned for any downstream NLU dask.

   - Masked language modelling on `myv` monolingual data;
   - Sentence pair classification to distinguish correct `ru-myv` translations from random pairs.
+ The model can be used as a sentence encoder or a masked language modelling predictor for Erzya, or fine-tuned for any downstream NLU dask.
+ Sentence embeddings can be produced with the code below:
  ```python
 import torch
 from transformers import AutoTokenizer, AutoModel
+tokenizer = AutoTokenizer.from_pretrained("slone/LaBSE-en-ru-myv-v1")
+model = AutoModel.from_pretrained("slone/LaBSE-en-ru-myv-v1")
 sentences = ["Hello World", "Привет Мир", "Шумбратадо Мастор"]
+encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
 with torch.no_grad():
     model_output = model(**encoded_input)
 embeddings = model_output.pooler_output
 print(embeddings.shape)  # torch.Size([3, 768])
 ```