STS (Semantic Textual Similarity) results

#37

by JoannaSapiecha - opened Dec 21, 2023

Dec 21, 2023

The STS results that I got using this model are worse than after applying some SBERT models. The cosines similarity score is almost always high/very high (above 0.75, never below 0.5) - for both short and longer texts.

michael-guenther

Jina AI org Dec 21, 2023

•

edited Dec 21, 2023

Yes, the absolute scores are not meaningful. The embeddings tend to concentrate around a cone in the vector space so that the scores always tend to be very high. You can only compare two similarity scores obtained with the same query (anchor) text value against each other. Only the differences between similarity values are expressed.

JoannaSapiecha

Dec 21, 2023

•

edited Dec 21, 2023

Thank you @michael-guenther for the hint. So will use the Jina base model to find 'the best match(es)' for a given text (anchor).

bwang0911 changed discussion status to closed Feb 26, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment