Universal AnglE Embedding
Follow us on:
- GitHub: https://github.com/SeanLee97/AnglE.
- Arxiv: https://arxiv.org/abs/2309.12871
🔥 Our universal English sentence embedding WhereIsAI/UAE-Large-V1
achieves SOTA on the MTEB Leaderboard with an average score of 64.64!
Usage
python -m pip install -U angle-emb
- Non-Retrieval Tasks
from angle_emb import AnglE
angle = AnglE.from_pretrained('WhereIsAI/UAE-Large-V1', pooling_strategy='cls').cuda()
vec = angle.encode('hello world', to_numpy=True)
print(vec)
vecs = angle.encode(['hello world1', 'hello world2'], to_numpy=True)
print(vecs)
- Retrieval Tasks
For retrieval purposes, please use the prompt Prompts.C
.
from angle_emb import AnglE, Prompts
angle = AnglE.from_pretrained('WhereIsAI/UAE-Large-V1', pooling_strategy='cls').cuda()
angle.set_prompt(prompt=Prompts.C)
vec = angle.encode({'text': 'hello world'}, to_numpy=True)
print(vec)
vecs = angle.encode([{'text': 'hello world1'}, {'text': 'hello world2'}], to_numpy=True)
print(vecs)
Citation
If you use our pre-trained models, welcome to support us by citing our work:
@article{li2023angle,
title={AnglE-optimized Text Embeddings},
author={Li, Xianming and Li, Jing},
journal={arXiv preprint arXiv:2309.12871},
year={2023}
}
- Downloads last month
- 10
Space using shubham-bgi/UAE-Large 1
Evaluation results
- accuracy on MTEB AmazonCounterfactualClassification (en)test set self-reported75.552
- ap on MTEB AmazonCounterfactualClassification (en)test set self-reported38.264
- f1 on MTEB AmazonCounterfactualClassification (en)test set self-reported69.410
- accuracy on MTEB AmazonPolarityClassificationtest set self-reported92.843
- ap on MTEB AmazonPolarityClassificationtest set self-reported89.576
- f1 on MTEB AmazonPolarityClassificationtest set self-reported92.826
- accuracy on MTEB AmazonReviewsClassification (en)test set self-reported48.292
- f1 on MTEB AmazonReviewsClassification (en)test set self-reported47.903
- map_at_1 on MTEB ArguAnatest set self-reported42.105
- map_at_10 on MTEB ArguAnatest set self-reported58.181
- map_at_100 on MTEB ArguAnatest set self-reported58.654
- map_at_1000 on MTEB ArguAnatest set self-reported58.657
- map_at_3 on MTEB ArguAnatest set self-reported54.386
- map_at_5 on MTEB ArguAnatest set self-reported56.758
- mrr_at_1 on MTEB ArguAnatest set self-reported42.745
- mrr_at_10 on MTEB ArguAnatest set self-reported58.437
- mrr_at_100 on MTEB ArguAnatest set self-reported58.895
- mrr_at_1000 on MTEB ArguAnatest set self-reported58.898
- mrr_at_3 on MTEB ArguAnatest set self-reported54.635
- mrr_at_5 on MTEB ArguAnatest set self-reported57.000
- ndcg_at_1 on MTEB ArguAnatest set self-reported42.105
- ndcg_at_10 on MTEB ArguAnatest set self-reported66.150
- ndcg_at_100 on MTEB ArguAnatest set self-reported68.048
- ndcg_at_1000 on MTEB ArguAnatest set self-reported68.114
- ndcg_at_3 on MTEB ArguAnatest set self-reported58.477
- ndcg_at_5 on MTEB ArguAnatest set self-reported62.768
- precision_at_1 on MTEB ArguAnatest set self-reported42.105
- precision_at_10 on MTEB ArguAnatest set self-reported9.111
- precision_at_100 on MTEB ArguAnatest set self-reported0.991
- precision_at_1000 on MTEB ArguAnatest set self-reported0.100
- precision_at_3 on MTEB ArguAnatest set self-reported23.447
- precision_at_5 on MTEB ArguAnatest set self-reported16.159
- recall_at_1 on MTEB ArguAnatest set self-reported42.105
- recall_at_10 on MTEB ArguAnatest set self-reported91.110
- recall_at_100 on MTEB ArguAnatest set self-reported99.147
- recall_at_1000 on MTEB ArguAnatest set self-reported99.644
- recall_at_3 on MTEB ArguAnatest set self-reported70.341
- recall_at_5 on MTEB ArguAnatest set self-reported80.797
- v_measure on MTEB ArxivClusteringP2Ptest set self-reported49.026
- v_measure on MTEB ArxivClusteringS2Stest set self-reported43.094
- map on MTEB AskUbuntuDupQuestionstest set self-reported64.196
- mrr on MTEB AskUbuntuDupQuestionstest set self-reported77.095
- cos_sim_pearson on MTEB BIOSSEStest set self-reported87.867
- cos_sim_spearman on MTEB BIOSSEStest set self-reported86.142
- euclidean_pearson on MTEB BIOSSEStest set self-reported85.990
- euclidean_spearman on MTEB BIOSSEStest set self-reported86.482
- manhattan_pearson on MTEB BIOSSEStest set self-reported85.645
- manhattan_spearman on MTEB BIOSSEStest set self-reported86.210
- accuracy on MTEB Banking77Classificationtest set self-reported87.692
- f1 on MTEB Banking77Classificationtest set self-reported87.681
- v_measure on MTEB BiorxivClusteringP2Ptest set self-reported39.375