STS (Semantic Textual Similarity) results

#37
by JoannaSapiecha - opened

The STS results that I got using this model are worse than after applying some SBERT models. The cosines similarity score is almost always high/very high (above 0.75, never below 0.5) - for both short and longer texts.

Yes, the absolute scores are not meaningful. The embeddings tend to concentrate around a cone in the vector space so that the scores always tend to be very high. You can only compare two similarity scores obtained with the same query (anchor) text value against each other. Only the differences between similarity values are expressed.

Thank you @michael-guenther for the hint. So will use the Jina base model to find 'the best match(es)' for a given text (anchor).

bwang0911 changed discussion status to closed

Sign up or log in to comment