aiana94
/

NaSE

Sentence Similarity

feature-extraction

sentence-embedding

text-embeddings-inference

Inference Endpoints

Model card Files Files and versions Community

aiana94 commited on Jun 18

Commit

c859df5

•

1 Parent(s): 8e08f41

Update README.md

Files changed (1) hide show

README.md +15 -6

README.md CHANGED Viewed

@@ -161,7 +161,7 @@ Here is how to use this model to get the sentence embeddings of a given text in
     # pepare input
     sentences = ["This is an example sentence", "Dies ist auch ein Beispielsatz in einer anderen Sprache."]
-    encoded_input = tokenizer.encode(sentences, return_tensors='pt')
     # forward pass
     with torch.no_grad():
@@ -181,7 +181,7 @@ and in Tensorflow:
     # pepare input
     sentences = ["This is an example sentence", "Dies ist auch ein Beispielsatz in einer anderen Sprache."]
-    encoded_input = tokenizer.encode(sentences, return_tensors='tf')
     # forward pass
     with torch.no_grad():
@@ -191,15 +191,24 @@ and in Tensorflow:
     sentence_embeddings = output.pooler_output
 ```
 ### Intended Uses
 Our model is intended to be used as a sentence, and in particular, news encoder. Given an input text, it outputs a vector which captures its semantic information.
 The sentence vector may be used for sentence similarity, information retrieval or clustering tasks.
-## Bias, Risks, and Limitations
-[More Information Needed]
 ## Training Details

     # pepare input
     sentences = ["This is an example sentence", "Dies ist auch ein Beispielsatz in einer anderen Sprache."]
+    encoded_input = tokenizer(sentences, return_tensors='pt', padding=True)
     # forward pass
     with torch.no_grad():
     # pepare input
     sentences = ["This is an example sentence", "Dies ist auch ein Beispielsatz in einer anderen Sprache."]
+    encoded_input = tokenizer(sentences, return_tensors='tf', padding=True)
     # forward pass
     with torch.no_grad():
     sentence_embeddings = output.pooler_output
 ```
+For similarity between sentences, an L2-norm is recommended before calculating the similarity:
+```python
+  import torch
+  import torch.nn.functional as F
+  def cos_sim(a: torch.Tensor, b: torch.Tensor):
+    a_norm = F.normalize(a, p=2, dim=1)
+    b_norm = F.normalize(b, p=2, dim=1)
+    return torch.mm(a_norm, b_norm.transpose(0, 1))
+```
 ### Intended Uses
 Our model is intended to be used as a sentence, and in particular, news encoder. Given an input text, it outputs a vector which captures its semantic information.
 The sentence vector may be used for sentence similarity, information retrieval or clustering tasks.
 ## Training Details