Example in readme could need explanation

#9
by tordbb - opened

Hi there,
In the example used, it seems you encode the same list of sentences twice, then you compare those (identical) encodings. Is this what the example intended to show?

embeddings_1 = model.encode(sentences)
embeddings_2 = model.encode(sentences)
similarity = embeddings_1 @ embeddings_2.T
print(similarity)

I hoped to see an example of how you compare the similarity of the two sentences in that list, which possibly could have looked something like this:

embeddings_1 = model.encode(sentences[0])
embeddings_2 = model.encode(sentences[1])
similarity = embeddings_1 @ embeddings_2.T
print(similarity)
Beijing Academy of Artificial Intelligence org

Hi, thanks for your interest!
We want to show the method to compute the similarities between two lists of sentences, where you can replace the sentences in embeddings_2 = model.encode(sentences) with any other list of sentences. This method will be helpful when the user needs to find the most similar sentence from a list of sentences.
Your method is true to compute the similarity between sentence_1 and sentence_2.

Thanks for explaining this, I see that makes sense!
What threw me off was the fact that you create both embeddings_1, and embeddings_2 based on the same variable, sentences.
If you want to make the usefulness of this model more obvious to new readers, you may want to change the example to the following:

from FlagEmbedding import FlagModel
sentences_1 = ["样例数据-1", "样例数据-2"]
sentences_2 = ["样例数据-3", "样例数据-4", "样例数据-5"]
model = FlagModel('BAAI/bge-large-zh', query_instruction_for_retrieval="为这个句子生成表示以用于检索相关文章:")
embeddings_1 = model.encode(sentences_1)
embeddings_2 = model.encode(sentences_2)
similarity = embeddings_1 @ embeddings_2.T
print(similarity)

PS; thank you for your work with this model. I just used the model to evaluate the caption generated from an image to the prompt from which the image was generated.
Seems like it did a good job!

Beijing Academy of Artificial Intelligence org

Thanks for your advice! I will update the readme.

Any recomendation to reproduce the model for spanish?

Sign up or log in to comment