AnglE📐: Angle-optimized Text Embeddings

It is Angle 📐, not Angel 👼.

🔥 A New SOTA Model for Semantic Textual Similarity!

Github: https://github.com/SeanLee97/AnglE

STS Results

Model	STS12	STS13	STS14	STS15	STS16	STSBenchmark	SICKRelatedness	Avg.
SeanLee97/angle-llama-7b-nli-20231027	78.68	90.58	85.49	89.56	86.91	88.92	81.18	85.90
SeanLee97/angle-llama-7b-nli-v2	79.00	90.56	85.79	89.43	87.00	88.97	80.94	85.96

Usage

use AnglE

python -m pip install -U angle-emb

from angle_emb import AnglE, Prompts

# init
angle = AnglE.from_pretrained('NousResearch/Llama-2-7b-hf', pretrained_lora_path='SeanLee97/angle-llama-7b-nli-v2')

# set prompt
print('All predefined prompts:', Prompts.list_prompts())
angle.set_prompt(prompt=Prompts.A)
print('prompt:', angle.prompt)

# encode text
vec = angle.encode({'text': 'hello world'}, to_numpy=True)
print(vec)
vecs = angle.encode([{'text': 'hello world1'}, {'text': 'hello world2'}], to_numpy=True)
print(vecs)

use transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel, PeftConfig

peft_model_id = 'SeanLee97/angle-llama-7b-nli-20231027'
config = PeftConfig.from_pretrained(peft_model_id)
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path).bfloat16().cuda()
model = PeftModel.from_pretrained(model, peft_model_id).cuda()

def decorate_text(text: str):
    return f'Summarize sentence "{text}" in one word:"'

inputs = 'hello world!'
tok = tokenizer([decorate_text(inputs)], return_tensors='pt')
for k, v in tok.items():
    tok[k] = v.cuda()
vec = model(output_hidden_states=True, **tok).hidden_states[-1][:, -1].float().detach().cpu().numpy()
print(vec)

Citation

You are welcome to use our code and pre-trained models. If you use our code and pre-trained models, please support us by citing our work as follows:

@article{li2023angle,
  title={AnglE-Optimized Text Embeddings},
  author={Li, Xianming and Li, Jing},
  journal={arXiv preprint arXiv:2309.12871},
  year={2023}
}

SeanLee97
/

angle-llama-7b-nli-v2

AnglE📐: Angle-optimized Text Embeddings

Usage

Citation

Datasets used to train SeanLee97/angle-llama-7b-nli-v2

Collection including SeanLee97/angle-llama-7b-nli-v2

AnglE NLI Sentence Embeddings