Update README.md
Browse files
README.md
CHANGED
@@ -161,7 +161,7 @@ Here is how to use this model to get the sentence embeddings of a given text in
|
|
161 |
|
162 |
# pepare input
|
163 |
sentences = ["This is an example sentence", "Dies ist auch ein Beispielsatz in einer anderen Sprache."]
|
164 |
-
encoded_input = tokenizer
|
165 |
|
166 |
# forward pass
|
167 |
with torch.no_grad():
|
@@ -181,7 +181,7 @@ and in Tensorflow:
|
|
181 |
|
182 |
# pepare input
|
183 |
sentences = ["This is an example sentence", "Dies ist auch ein Beispielsatz in einer anderen Sprache."]
|
184 |
-
encoded_input = tokenizer
|
185 |
|
186 |
# forward pass
|
187 |
with torch.no_grad():
|
@@ -191,15 +191,24 @@ and in Tensorflow:
|
|
191 |
sentence_embeddings = output.pooler_output
|
192 |
```
|
193 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
194 |
### Intended Uses
|
195 |
|
196 |
Our model is intended to be used as a sentence, and in particular, news encoder. Given an input text, it outputs a vector which captures its semantic information.
|
197 |
The sentence vector may be used for sentence similarity, information retrieval or clustering tasks.
|
198 |
|
199 |
-
## Bias, Risks, and Limitations
|
200 |
-
|
201 |
-
[More Information Needed]
|
202 |
-
|
203 |
|
204 |
## Training Details
|
205 |
|
|
|
161 |
|
162 |
# pepare input
|
163 |
sentences = ["This is an example sentence", "Dies ist auch ein Beispielsatz in einer anderen Sprache."]
|
164 |
+
encoded_input = tokenizer(sentences, return_tensors='pt', padding=True)
|
165 |
|
166 |
# forward pass
|
167 |
with torch.no_grad():
|
|
|
181 |
|
182 |
# pepare input
|
183 |
sentences = ["This is an example sentence", "Dies ist auch ein Beispielsatz in einer anderen Sprache."]
|
184 |
+
encoded_input = tokenizer(sentences, return_tensors='tf', padding=True)
|
185 |
|
186 |
# forward pass
|
187 |
with torch.no_grad():
|
|
|
191 |
sentence_embeddings = output.pooler_output
|
192 |
```
|
193 |
|
194 |
+
For similarity between sentences, an L2-norm is recommended before calculating the similarity:
|
195 |
+
|
196 |
+
```python
|
197 |
+
import torch
|
198 |
+
import torch.nn.functional as F
|
199 |
+
|
200 |
+
def cos_sim(a: torch.Tensor, b: torch.Tensor):
|
201 |
+
a_norm = F.normalize(a, p=2, dim=1)
|
202 |
+
b_norm = F.normalize(b, p=2, dim=1)
|
203 |
+
|
204 |
+
return torch.mm(a_norm, b_norm.transpose(0, 1))
|
205 |
+
```
|
206 |
+
|
207 |
### Intended Uses
|
208 |
|
209 |
Our model is intended to be used as a sentence, and in particular, news encoder. Given an input text, it outputs a vector which captures its semantic information.
|
210 |
The sentence vector may be used for sentence similarity, information retrieval or clustering tasks.
|
211 |
|
|
|
|
|
|
|
|
|
212 |
|
213 |
## Training Details
|
214 |
|