OrdalieTech/Solon-embeddings-large-0.1 · Feature extraction output dimentions, how to use as sentence embedding ?

Apr 17, 2024

Hi,

I'm currently using "dangvantuan/sentence-camembert-large" as an embedding model to transform a sentence to a one dimentional vector of (1024 length)

Exemple:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("dangvantuan/sentence-camembert-large")
embedding = model.encode("Ceci est une phrase de test.")
embedding.shape
(1024,)

I tried to use the transformer library to use your model with a "Feature extraction" task:

from transformers import pipeline

pipe = pipeline("feature-extraction", model="OrdalieTech/Solon-embeddings-large-0.1")
embedding = pipe("Ceci est une phrase de test.")

len(embedding[0]) # 9
len(embedding[0][0]) # 1024

Here the dimention of the embedding seems to be 9x1024 (nine list of 1024 float)

My question is how to actually use this representation as a replacement of my current model to compute text similarities via a vector db? Can't I convert it to a 1x1024 vector ?

Thanks you in advance maybe it's a dumb question :)

YorelNation

Apr 17, 2024

•

edited Apr 17, 2024

Maybe I will be able to use your 9x1024 representation in my vector database actually ?without any need to flatten it

EDIT: Vector database indeed need a one dimention vector

YorelNation

Apr 18, 2024

Would this implementation is considered valid to embed sentences, paragraphs ?

Feature extraction output dimentions, how to use as sentence embedding ?

Exemple:

model = SentenceTransformer("dangvantuan/sentence-camembert-large")embedding = model.encode("Ceci est une phrase de test.")embedding.shape(1024,)

I tried to use the transformer library to use your model with a "Feature extraction" task:

model = SentenceTransformer("dangvantuan/sentence-camembert-large")
embedding = model.encode("Ceci est une phrase de test.")
embedding.shape
(1024,)