Update README.md
Browse files
README.md
CHANGED
@@ -392,4 +392,32 @@ model-index:
|
|
392 |
---
|
393 |
|
394 |
|
395 |
-
Use Chinese and English STS and NLI corpora to conduct contrastive learning finetuning on xlmr
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
392 |
---
|
393 |
|
394 |
|
395 |
+
Use Chinese and English STS and NLI corpora to conduct contrastive learning finetuning on xlmr
|
396 |
+
|
397 |
+
|
398 |
+
|
399 |
+
## Using HuggingFace Transformers
|
400 |
+
|
401 |
+
```
|
402 |
+
from transformers import AutoTokenizer, AutoModel
|
403 |
+
import torch
|
404 |
+
# Sentences we want sentence embeddings for
|
405 |
+
sentences = ["样例数据-1", "样例数据-2"]
|
406 |
+
|
407 |
+
# Load model from HuggingFace Hub
|
408 |
+
tokenizer = AutoTokenizer.from_pretrained('zhou-xl/bi-cse')
|
409 |
+
model = AutoModel.from_pretrained('zhou-xl/bi-cse')
|
410 |
+
model.eval()
|
411 |
+
|
412 |
+
# Tokenize sentences
|
413 |
+
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
|
414 |
+
|
415 |
+
# Compute token embeddings
|
416 |
+
with torch.no_grad():
|
417 |
+
model_output = model(**encoded_input)
|
418 |
+
# Perform pooling. In this case, cls pooling.
|
419 |
+
sentence_embeddings = model_output[0][:, 0]
|
420 |
+
# normalize embeddings
|
421 |
+
sentence_embeddings = torch.nn.functional.normalize(sentence_embeddings, p=2, dim=1)
|
422 |
+
print("Sentence embeddings:", sentence_embeddings)
|
423 |
+
```
|