aiana94 commited on
Commit
c859df5
1 Parent(s): 8e08f41

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -6
README.md CHANGED
@@ -161,7 +161,7 @@ Here is how to use this model to get the sentence embeddings of a given text in
161
 
162
  # pepare input
163
  sentences = ["This is an example sentence", "Dies ist auch ein Beispielsatz in einer anderen Sprache."]
164
- encoded_input = tokenizer.encode(sentences, return_tensors='pt')
165
 
166
  # forward pass
167
  with torch.no_grad():
@@ -181,7 +181,7 @@ and in Tensorflow:
181
 
182
  # pepare input
183
  sentences = ["This is an example sentence", "Dies ist auch ein Beispielsatz in einer anderen Sprache."]
184
- encoded_input = tokenizer.encode(sentences, return_tensors='tf')
185
 
186
  # forward pass
187
  with torch.no_grad():
@@ -191,15 +191,24 @@ and in Tensorflow:
191
  sentence_embeddings = output.pooler_output
192
  ```
193
 
 
 
 
 
 
 
 
 
 
 
 
 
 
194
  ### Intended Uses
195
 
196
  Our model is intended to be used as a sentence, and in particular, news encoder. Given an input text, it outputs a vector which captures its semantic information.
197
  The sentence vector may be used for sentence similarity, information retrieval or clustering tasks.
198
 
199
- ## Bias, Risks, and Limitations
200
-
201
- [More Information Needed]
202
-
203
 
204
  ## Training Details
205
 
 
161
 
162
  # pepare input
163
  sentences = ["This is an example sentence", "Dies ist auch ein Beispielsatz in einer anderen Sprache."]
164
+ encoded_input = tokenizer(sentences, return_tensors='pt', padding=True)
165
 
166
  # forward pass
167
  with torch.no_grad():
 
181
 
182
  # pepare input
183
  sentences = ["This is an example sentence", "Dies ist auch ein Beispielsatz in einer anderen Sprache."]
184
+ encoded_input = tokenizer(sentences, return_tensors='tf', padding=True)
185
 
186
  # forward pass
187
  with torch.no_grad():
 
191
  sentence_embeddings = output.pooler_output
192
  ```
193
 
194
+ For similarity between sentences, an L2-norm is recommended before calculating the similarity:
195
+
196
+ ```python
197
+ import torch
198
+ import torch.nn.functional as F
199
+
200
+ def cos_sim(a: torch.Tensor, b: torch.Tensor):
201
+ a_norm = F.normalize(a, p=2, dim=1)
202
+ b_norm = F.normalize(b, p=2, dim=1)
203
+
204
+ return torch.mm(a_norm, b_norm.transpose(0, 1))
205
+ ```
206
+
207
  ### Intended Uses
208
 
209
  Our model is intended to be used as a sentence, and in particular, news encoder. Given an input text, it outputs a vector which captures its semantic information.
210
  The sentence vector may be used for sentence similarity, information retrieval or clustering tasks.
211
 
 
 
 
 
212
 
213
  ## Training Details
214