Sentence Transformers integration

#2
by tomaarsen HF staff - opened

Hello!

Pull Request overview

  • Add Sentence Transformers integration.

Details

This PR adds proper support in Sentence Transformers, i.e. the package often used in third party embedding applications. It abstracts away a lot of the transformers code from the user, and instead hides it in the configuration. As a result, the user can just use:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("Snowflake/snowflake-arctic-embed-l")

queries = ['what is snowflake?', 'Where can I get the best tacos?']
documents = ['The Data Cloud!', 'Mexico City of Course!']

query_embeddings = model.encode(queries, prompt_name="query")
document_embeddings = model.encode(documents)

instead of manually loading both the model and the tokenizer, adding the query prompt themselves, computing the token embeddings & then taking the CLS embedding and then doing normalization.

P.s. Sentence Transformers is being maintained by Hugging Face.

  • Tom Aarsen
tomaarsen changed pull request status to open
spacemanidol changed pull request status to merged

Sign up or log in to comment