Which embedding vector to use?

by moooji - opened May 26, 2023

May 26, 2023

Hi,

we would like to use the model for document embedding and retrieval. We noticed that the model returns 3 different vectors though (all 1024 dimensions). What is the difference between them and which of those should we use for retrieval with cosine similarity? Or do we have to compute the feature vector in another step out of those 3?

moooji changed discussion title from Which vector for embeddings? to Which embedding vector to use? May 26, 2023

intfloat

Owner May 26, 2023

•

edited May 26, 2023

Not sure about your problem, the code at https://huggingface.co/intfloat/e5-large-v2#usage only produces one vector for each input text.

Can you provide more details on how you compute the embeddings?

moooji

May 26, 2023

•

edited May 26, 2023

Thank you for your quick reply! What I mean is that when we deploy the model to "Inference Endpoints" or use the "Hosted inference API" on the hugging face page, it will output multiple vectors.

So it's a bit unclear to us what the output of the "Hosted Inference API" represents. For example for this input "passage: E5 is awesome", it will return 8 vectors like this:

[
  [
    [1024], 
    [1024], 
    [1024],
    [1024],
    [1024],
    [1024],
    [1024],
    [1024]
  ]
]

Is this maybe one vector per token?
Do I understand it correctly that we would have to average those to get one feature vector?

intfloat

Owner May 26, 2023

•

edited May 26, 2023

This inference API is automatically set up by HuggingFace, looks like it returns the last layer hidden states.

Yes, please follow our demo code to average them into one vector and use cosine similarity for retrieval.

amirhalo

Jun 9, 2023

Is it possible to edit the Hugging Face model to do that instead of having clients do it?

intfloat

Owner Jun 11, 2023

I am not aware of any way to do it automatically. Prepending a string prefix should be fairly trivial to do on client side.

ababeal

Oct 13, 2023

Same thing is happening when running it with from sagemaker.huggingface.model import HuggingFaceModel

ababeal

Oct 13, 2023

This comment has been hidden

ababeal

Oct 13, 2023

This comment has been hidden

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment