Universal AnglE Embedding

GitHub: https://github.com/SeanLee97/AnglE.
Arxiv: https://arxiv.org/abs/2309.12871

🔥 Our universal English sentence embedding WhereIsAI/UAE-Large-V1 achieves SOTA on the MTEB Leaderboard with an average score of 64.64!

Usage

python -m pip install -U angle-emb

Non-Retrieval Tasks

from angle_emb import AnglE

angle = AnglE.from_pretrained('WhereIsAI/UAE-Large-V1', pooling_strategy='cls').cuda()
vec = angle.encode('hello world', to_numpy=True)
print(vec)
vecs = angle.encode(['hello world1', 'hello world2'], to_numpy=True)
print(vecs)

Retrieval Tasks

For retrieval purposes, please use the prompt Prompts.C.

from angle_emb import AnglE, Prompts

angle = AnglE.from_pretrained('WhereIsAI/UAE-Large-V1', pooling_strategy='cls').cuda()
angle.set_prompt(prompt=Prompts.C)
vec = angle.encode({'text': 'hello world'}, to_numpy=True)
print(vec)
vecs = angle.encode([{'text': 'hello world1'}, {'text': 'hello world2'}], to_numpy=True)
print(vecs)

Citation

If you use our pre-trained models, welcome to support us by citing our work:

@article{li2023angle,
  title={AnglE-optimized Text Embeddings},
  author={Li, Xianming and Li, Jing},
  journal={arXiv preprint arXiv:2309.12871},
  year={2023}
}

Spaces using shubham-bgi/UAE-Large 4

Evaluation results

accuracy on MTEB AmazonCounterfactualClassification (en)
test set self-reported

75.552
ap on MTEB AmazonCounterfactualClassification (en)
test set self-reported

38.264
f1 on MTEB AmazonCounterfactualClassification (en)
test set self-reported

69.410
accuracy on MTEB AmazonPolarityClassification
test set self-reported

92.843
ap on MTEB AmazonPolarityClassification
test set self-reported

89.576
f1 on MTEB AmazonPolarityClassification
test set self-reported

92.826
accuracy on MTEB AmazonReviewsClassification (en)
test set self-reported

48.292
f1 on MTEB AmazonReviewsClassification (en)
test set self-reported

47.903
map_at_1 on MTEB ArguAna
test set self-reported

42.105
map_at_10 on MTEB ArguAna
test set self-reported

58.181

View on Papers With Code