Use Chinese and English STS and NLI corpora to conduct contrastive learning finetuning on xlmr
Using HuggingFace Transformers
from transformers import AutoTokenizer, AutoModel
import torch
# Sentences we want sentence embeddings for
sentences = ["样例数据-1", "样例数据-2"]
# Load model from HuggingFace Hub
tokenizer = AutoTokenizer.from_pretrained('zhou-xl/bi-cse')
model = AutoModel.from_pretrained('zhou-xl/bi-cse')
model.eval()
# Tokenize sentences
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
# Compute token embeddings
with torch.no_grad():
model_output = model(**encoded_input)
# Perform pooling. In this case, cls pooling.
sentence_embeddings = model_output[0][:, 0]
# normalize embeddings
sentence_embeddings = torch.nn.functional.normalize(sentence_embeddings, p=2, dim=1)
print("Sentence embeddings:", sentence_embeddings)
- Downloads last month
- 4,514
Spaces using zhou-xl/bi-cse 3
Evaluation results
- cos_sim_pearson on MTEB AFQMCvalidation set self-reported42.010
- cos_sim_spearman on MTEB AFQMCvalidation set self-reported43.449
- euclidean_pearson on MTEB AFQMCvalidation set self-reported41.933
- euclidean_spearman on MTEB AFQMCvalidation set self-reported43.457
- manhattan_pearson on MTEB AFQMCvalidation set self-reported41.930
- manhattan_spearman on MTEB AFQMCvalidation set self-reported43.445
- cos_sim_pearson on MTEB ATECtest set self-reported47.484
- cos_sim_spearman on MTEB ATECtest set self-reported48.010
- cos_sim_pearson on MTEB BIOSSEStest set self-reported70.066
- cos_sim_spearman on MTEB BIOSSEStest set self-reported70.564
- cos_sim_pearson on MTEB BQtest set self-reported63.306
- cos_sim_spearman on MTEB BQtest set self-reported65.578
- euclidean_pearson on MTEB BQtest set self-reported64.417
- euclidean_spearman on MTEB BQtest set self-reported65.602
- manhattan_pearson on MTEB BQtest set self-reported64.375
- manhattan_spearman on MTEB BQtest set self-reported65.570
- accuracy on MTEB BUCC (zh-en)test set self-reported98.420
- f1 on MTEB BUCC (zh-en)test set self-reported98.385
- precision on MTEB BUCC (zh-en)test set self-reported98.368
- recall on MTEB BUCC (zh-en)test set self-reported98.420
- cos_sim_pearson on MTEB LCQMCtest set self-reported71.310
- cos_sim_spearman on MTEB LCQMCtest set self-reported75.556
- euclidean_pearson on MTEB LCQMCtest set self-reported74.371
- euclidean_spearman on MTEB LCQMCtest set self-reported75.548
- manhattan_pearson on MTEB LCQMCtest set self-reported74.298
- manhattan_spearman on MTEB LCQMCtest set self-reported75.490
- cos_sim_pearson on MTEB PAWSXtest set self-reported42.822
- cos_sim_spearman on MTEB PAWSXtest set self-reported47.617
- euclidean_pearson on MTEB PAWSXtest set self-reported46.992
- euclidean_spearman on MTEB PAWSXtest set self-reported47.624
- manhattan_pearson on MTEB PAWSXtest set self-reported46.835
- manhattan_spearman on MTEB PAWSXtest set self-reported47.473
- cos_sim_pearson on MTEB QBQTCtest set self-reported39.483
- cos_sim_spearman on MTEB QBQTCtest set self-reported40.433
- euclidean_pearson on MTEB QBQTCtest set self-reported39.121
- euclidean_spearman on MTEB QBQTCtest set self-reported40.478
- manhattan_pearson on MTEB QBQTCtest set self-reported39.070
- manhattan_spearman on MTEB QBQTCtest set self-reported40.413
- cos_sim_pearson on MTEB SICK-Rtest set self-reported81.605
- cos_sim_spearman on MTEB SICK-Rtest set self-reported79.043
- euclidean_pearson on MTEB SICK-Rtest set self-reported78.952
- euclidean_spearman on MTEB SICK-Rtest set self-reported78.992
- manhattan_pearson on MTEB SICK-Rtest set self-reported78.940
- manhattan_spearman on MTEB SICK-Rtest set self-reported78.941
- cos_sim_pearson on MTEB STS12test set self-reported85.505
- cos_sim_spearman on MTEB STS12test set self-reported78.393
- euclidean_pearson on MTEB STS12test set self-reported83.039
- euclidean_spearman on MTEB STS12test set self-reported78.431
- manhattan_pearson on MTEB STS12test set self-reported83.007
- manhattan_spearman on MTEB STS12test set self-reported78.338