Edit model card

xiaobu-embedding

模型:基于GTE模型[1]多任务微调。
数据:闲聊类Query-Query、知识类Query-Doc、BGE开源Query-Doc[2];清洗正例,挖掘中等难度负例;累计6M(质量更重要)。

Usage (Sentence-Transformers)

pip install -U sentence-transformers

相似度计算:

from sentence_transformers import SentenceTransformer
sentences_1 = ["样例数据-1", "样例数据-2"]
sentences_2 = ["样例数据-3", "样例数据-4"]
model = SentenceTransformer('lier007/xiaobu-embedding')
embeddings_1 = model.encode(sentences_1, normalize_embeddings=True)
embeddings_2 = model.encode(sentences_2, normalize_embeddings=True)
similarity = embeddings_1 @ embeddings_2.T
print(similarity)

Evaluation

参考BGE中文CMTEB评估[2]

Finetune

参考BGE微调模块[2]

Reference

  1. https://huggingface.co/thenlper/gte-large-zh
  2. https://github.com/FlagOpen/FlagEmbedding
Downloads last month
2,405

Evaluation results