Snowflake/snowflake-arctic-embed-l · Sentence Transformers integration

tomaarsen

Apr 16, 2024

•

edited Apr 16, 2024

Hello!

Pull Request overview

Add Sentence Transformers integration.

Details

This PR adds proper support in Sentence Transformers, i.e. the package often used in third party embedding applications. It abstracts away a lot of the transformers code from the user, and instead hides it in the configuration. As a result, the user can just use:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("Snowflake/snowflake-arctic-embed-l")

queries = ['what is snowflake?', 'Where can I get the best tacos?']
documents = ['The Data Cloud!', 'Mexico City of Course!']

query_embeddings = model.encode(queries, prompt_name="query")
document_embeddings = model.encode(documents)

instead of manually loading both the model and the tokenizer, adding the query prompt themselves, computing the token embeddings & then taking the CLS embedding and then doing normalization.

P.s. Sentence Transformers is being maintained by Hugging Face.

Tom Aarsen

Add Sentence Transformers integration + READMEfc5461e0

tomaarsen changed pull request status to open Apr 16, 2024

spacemanidol changed pull request status to merged Apr 16, 2024