Temperature-scaled cosine similarity function?

by nalzok - opened Jan 3, 2024

Jan 3, 2024

•

edited Jan 3, 2024

In the code example, you measure the similarity between two sentences with the inner product between normalized embeddings. However, in Section 3.2 of your technical report, you wrote "In this paper, we adopt the temperature-scaled cosine similarity function as follows".

I have two sets of documents in my use case, denoted A and B. The task is to find the top-k documents in B that are most semantically similar to those in A (by computing the average distance between each document in B and all documents in A). What do you recommend me to do to use as a distance measure between two embeddings, traditional cosine distance, or the temperature-scaled cosine similarity function?

As a bonus question, do you think it would be better if I prepend the instructions to documents in A in addition to those in B, i.e. prepend them to documents in addition to queries? After all, this is a symmetric task, and I suppose some symmetry will help.

nalzok changed discussion title from Question about code sample to Temperature-scaled cosine similarity function? Jan 3, 2024

intfloat

Owner Jan 3, 2024

Hi @nalzok ,

The inner product between normalized embeddings is mathematically equivalent to cosine similarity function, so they are the same thing.

About the instructions, yes, we add instructions to both for symmetric tasks such as STS (see https://github.com/microsoft/unilm/blob/78b3a48de27c388a0212cfee49fd6dc470c9ecb5/e5/mteb_except_retrieval_eval.py#L68).

nalzok

Jan 4, 2024

Thanks for the reply! As a follow-up question: what's the recommended way to process long texts, particularly those with multiple lines? I'm asking because your template f'Instruct: {task_description}\nQuery: {query}' includes a \n character. Would the newline characters in query interfere with the template?

intfloat

Owner Jan 5, 2024

No, it is okay to include \n in either the query or documents.

nalzok changed discussion status to closed Jan 7, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment