Universal AnglE Embedding
📢 WhereIsAI/UAE
is licensed under MIT. Feel free to use it in any scenario.
Follow us on:
- GitHub: https://github.com/SeanLee97/AnglE.
- Arxiv: https://arxiv.org/abs/2309.12871
🔥 Our universal English sentence embedding WhereIsAI/UAE-Large-V1
achieves SOTA on the MTEB Leaderboard with an average score of 64.64!
Usage
1. angle_emb
python -m pip install -U angle-emb
- Non-Retrieval Tasks
There is no need to specify any prompts.
from angle_emb import AnglE
from scipy import spatial
angle = AnglE.from_pretrained('WhereIsAI/UAE-Large-V1', pooling_strategy='cls').cuda()
doc_vecs = angle.encode([
'The weather is great!',
'The weather is very good!',
'i am going to bed'
])
for i, dv1 in enumerate(doc_vecs):
for dv2 in doc_vecs[i+1:]:
print(1 - spatial.distance.cosine(dv1, dv2))
- Retrieval Tasks
For retrieval purposes, please use the prompt Prompts.C
for query (not for document).
from angle_emb import AnglE, Prompts
from scipy import spatial
angle = AnglE.from_pretrained('WhereIsAI/UAE-Large-V1', pooling_strategy='cls').cuda()
qv = angle.encode(Prompts.C.format(text='what is the weather?'))
doc_vecs = angle.encode([
'The weather is great!',
'it is rainy today.',
'i am going to bed'
])
for dv in doc_vecs:
print(1 - spatial.distance.cosine(qv[0], dv))
2. sentence transformer
from angle_emb import Prompts
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("WhereIsAI/UAE-Large-V1").cuda()
qv = model.encode(Prompts.C.format(text='what is the weather?'))
doc_vecs = model.encode([
'The weather is great!',
'it is rainy today.',
'i am going to bed'
])
for dv in doc_vecs:
print(1 - spatial.distance.cosine(qv, dv))
Citation
If you use our pre-trained models, welcome to support us by citing our work:
@article{li2023angle,
title={AnglE-optimized Text Embeddings},
author={Li, Xianming and Li, Jing},
journal={arXiv preprint arXiv:2309.12871},
year={2023}
}
- Downloads last month
- 268,100
Spaces using WhereIsAI/UAE-Large-V1 20
Evaluation results
- accuracy on MTEB AmazonCounterfactualClassification (en)test set self-reported75.552
- ap on MTEB AmazonCounterfactualClassification (en)test set self-reported38.264
- f1 on MTEB AmazonCounterfactualClassification (en)test set self-reported69.410
- accuracy on MTEB AmazonPolarityClassificationtest set self-reported92.843
- ap on MTEB AmazonPolarityClassificationtest set self-reported89.576
- f1 on MTEB AmazonPolarityClassificationtest set self-reported92.826
- accuracy on MTEB AmazonReviewsClassification (en)test set self-reported48.292
- f1 on MTEB AmazonReviewsClassification (en)test set self-reported47.903
- map_at_1 on MTEB ArguAnatest set self-reported42.105
- map_at_10 on MTEB ArguAnatest set self-reported58.181
- map_at_100 on MTEB ArguAnatest set self-reported58.654
- map_at_1000 on MTEB ArguAnatest set self-reported58.657
- map_at_3 on MTEB ArguAnatest set self-reported54.386
- map_at_5 on MTEB ArguAnatest set self-reported56.758
- mrr_at_1 on MTEB ArguAnatest set self-reported42.745
- mrr_at_10 on MTEB ArguAnatest set self-reported58.437
- mrr_at_100 on MTEB ArguAnatest set self-reported58.895
- mrr_at_1000 on MTEB ArguAnatest set self-reported58.898
- mrr_at_3 on MTEB ArguAnatest set self-reported54.635
- mrr_at_5 on MTEB ArguAnatest set self-reported57.000
- ndcg_at_1 on MTEB ArguAnatest set self-reported42.105
- ndcg_at_10 on MTEB ArguAnatest set self-reported66.150
- ndcg_at_100 on MTEB ArguAnatest set self-reported68.048
- ndcg_at_1000 on MTEB ArguAnatest set self-reported68.114
- ndcg_at_3 on MTEB ArguAnatest set self-reported58.477
- ndcg_at_5 on MTEB ArguAnatest set self-reported62.768
- precision_at_1 on MTEB ArguAnatest set self-reported42.105
- precision_at_10 on MTEB ArguAnatest set self-reported9.111
- precision_at_100 on MTEB ArguAnatest set self-reported0.991
- precision_at_1000 on MTEB ArguAnatest set self-reported0.100
- precision_at_3 on MTEB ArguAnatest set self-reported23.447
- precision_at_5 on MTEB ArguAnatest set self-reported16.159
- recall_at_1 on MTEB ArguAnatest set self-reported42.105
- recall_at_10 on MTEB ArguAnatest set self-reported91.110
- recall_at_100 on MTEB ArguAnatest set self-reported99.147
- recall_at_1000 on MTEB ArguAnatest set self-reported99.644
- recall_at_3 on MTEB ArguAnatest set self-reported70.341
- recall_at_5 on MTEB ArguAnatest set self-reported80.797
- v_measure on MTEB ArxivClusteringP2Ptest set self-reported49.026
- v_measure on MTEB ArxivClusteringS2Stest set self-reported43.094
- map on MTEB AskUbuntuDupQuestionstest set self-reported64.196
- mrr on MTEB AskUbuntuDupQuestionstest set self-reported77.095
- cos_sim_pearson on MTEB BIOSSEStest set self-reported87.867
- cos_sim_spearman on MTEB BIOSSEStest set self-reported86.142
- euclidean_pearson on MTEB BIOSSEStest set self-reported85.990
- euclidean_spearman on MTEB BIOSSEStest set self-reported86.482
- manhattan_pearson on MTEB BIOSSEStest set self-reported85.645
- manhattan_spearman on MTEB BIOSSEStest set self-reported86.210
- accuracy on MTEB Banking77Classificationtest set self-reported87.692
- f1 on MTEB Banking77Classificationtest set self-reported87.681
- v_measure on MTEB BiorxivClusteringP2Ptest set self-reported39.375