Transformers
English
Inference Endpoints
Edit model card

AnglE📐: Angle-optimized Text Embeddings

It is Angle 📐, not Angel 👼.

🔥 A New SOTA Model for Semantic Textual Similarity!

Github: https://github.com/SeanLee97/AnglE

https://arxiv.org/abs/2309.12871

PWC PWC PWC PWC PWC PWC PWC

STS Results

Model ATEC BQ LCQMC PAWSX STS-B SOHU-dd SOHU-dc Avg.
^shibing624/text2vec-bge-large-chinese 38.41 61.34 71.72 35.15 76.44 71.81 63.15 59.72
^shibing624/text2vec-base-chinese-paraphrase 44.89 63.58 74.24 40.90 78.93 76.70 63.30 63.08
SeanLee97/angle-roberta-wwm-base-zhnli-v1 49.49 72.47 78.33 59.13 77.14 72.36 60.53 67.06
SeanLee97/angle-llama-7b-zhnli-v1 50.44 71.95 78.90 56.57 81.11 68.11 52.02 65.59

^ denotes baselines, their results are retrieved from https://github.com/shibing624/text2vec

Usage

from angle_emb import AnglE, Prompts

angle = AnglE.from_pretrained('NousResearch/Llama-2-7b-hf', pretrained_lora_path='SeanLee97/angle-llama-7b-zhnli-v1')
# 请选择对应的 prompt,此模型对应 Prompts.B
print('All predefined prompts:', Prompts.list_prompts())
angle.set_prompt(prompt=Prompts.B)
print('prompt:', angle.prompt)
vec = angle.encode({'text': '你好世界'}, to_numpy=True)
print(vec)
vecs = angle.encode([{'text': '你好世界1'}, {'text': '你好世界2'}], to_numpy=True)
print(vecs)

Citation

You are welcome to use our code and pre-trained models. If you use our code and pre-trained models, please support us by citing our work as follows:

@article{li2023angle,
  title={AnglE-Optimized Text Embeddings},
  author={Li, Xianming and Li, Jing},
  journal={arXiv preprint arXiv:2309.12871},
  year={2023}
}
Downloads last month
0
Unable to determine this model’s pipeline type. Check the docs .

Datasets used to train SeanLee97/angle-llama-7b-zhnli-v1