Universal AnglE Embedding

📢 WhereIsAI/UAE-Large-V1 is licensed under MIT. Feel free to use it in any scenario. If you use it for academic papers, you could cite us via 👉 citation info.

🤝 Follow us on:

GitHub: https://github.com/SeanLee97/AnglE.
Preprint Paper: AnglE-optimized Text Embeddings
Conference Paper: AoE: Angle-optimized Embeddings for Semantic Textual Similarity (ACL24)
📘 Documentation: https://angle.readthedocs.io/en/latest/index.html

Welcome to using AnglE to train and infer powerful sentence embeddings.

🏆 Achievements

📅 May 16, 2024 | AnglE's paper is accepted by ACL 2024 Main Conference
📅 Dec 4, 2024 | 🔥 Our universal English sentence embedding WhereIsAI/UAE-Large-V1 achieves SOTA on the MTEB Leaderboard with an average score of 64.64!

🧑‍🤝‍🧑 Siblings:

WhereIsAI/UAE-Code-Large-V1: This model can be used for code or GitHub issue similarity measurement.

Usage

1. angle_emb

python -m pip install -U angle-emb

Non-Retrieval Tasks

There is no need to specify any prompts.

from angle_emb import AnglE
from angle_emb.utils import cosine_similarity

angle = AnglE.from_pretrained('WhereIsAI/UAE-Large-V1', pooling_strategy='cls').cuda()
doc_vecs = angle.encode([
    'The weather is great!',
    'The weather is very good!',
    'i am going to bed'
], normalize_embedding=True)

for i, dv1 in enumerate(doc_vecs):
    for dv2 in doc_vecs[i+1:]:
        print(cosine_similarity(dv1, dv2))

Retrieval Tasks

For retrieval purposes, please use the prompt Prompts.C for query (not for document).

from angle_emb import AnglE, Prompts
from angle_emb.utils import cosine_similarity

angle = AnglE.from_pretrained('WhereIsAI/UAE-Large-V1', pooling_strategy='cls').cuda()
qv = angle.encode(Prompts.C.format(text='what is the weather?'))
doc_vecs = angle.encode([
    'The weather is great!',
    'it is rainy today.',
    'i am going to bed'
])

for dv in doc_vecs:
    print(cosine_similarity(qv[0], dv))

2. sentence transformer

from angle_emb import Prompts
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("WhereIsAI/UAE-Large-V1").cuda()

qv = model.encode(Prompts.C.format(text='what is the weather?'))
doc_vecs = model.encode([
    'The weather is great!',
    'it is rainy today.',
    'i am going to bed'
])

for dv in doc_vecs:
    print(1 - spatial.distance.cosine(qv, dv))

3. Infinity

Infinity is a MIT licensed server for OpenAI-compatible deployment.

docker run --gpus all -v $PWD/data:/app/.cache -p "7997":"7997" \
michaelf34/infinity:latest \
v2 --model-id WhereIsAI/UAE-Large-V1 --revision "369c368f70f16a613f19f5598d4f12d9f44235d4" --dtype float16 --batch-size 32 --device cuda --engine torch --port 7997

Citation

If you use our pre-trained models, welcome to support us by citing our work:

@article{li2023angle,
  title={AnglE-optimized Text Embeddings},
  author={Li, Xianming and Li, Jing},
  journal={arXiv preprint arXiv:2309.12871},
  year={2023}
}

Downloads last month: 2,297,576

Safetensors

Model size

335M params

Tensor type

F32

Model tree for WhereIsAI/UAE-Large-V1

Adapters

2 models

Finetunes

8 models

Merges

1 model

Quantizations

4 models

Spaces using WhereIsAI/UAE-Large-V1 32

Collection including WhereIsAI/UAE-Large-V1

Universal AnglE Embeddings

Collection

AnglE(https://arxiv.org/abs/2309.12871) series Embeddings. • 4 items • Updated Jul 31, 2024 • 4

Evaluation results

accuracy on MTEB AmazonCounterfactualClassification (en)
test set self-reported

75.552
ap on MTEB AmazonCounterfactualClassification (en)
test set self-reported

38.264
f1 on MTEB AmazonCounterfactualClassification (en)
test set self-reported

69.410
accuracy on MTEB AmazonPolarityClassification
test set self-reported

92.843
ap on MTEB AmazonPolarityClassification
test set self-reported

89.576
f1 on MTEB AmazonPolarityClassification
test set self-reported

92.826
accuracy on MTEB AmazonReviewsClassification (en)
test set self-reported

48.292
f1 on MTEB AmazonReviewsClassification (en)
test set self-reported

47.903
map_at_1 on MTEB ArguAna
test set self-reported

42.105
map_at_10 on MTEB ArguAna
test set self-reported

58.181

View on Papers With Code