Best instructions for clustering and semantic similarity

#29

by rmilliere - opened Jun 11

Jun 11

The model card gives an example instruction for retrieval.

What are the recommended instructions to get embeddings optimized for either clustering or sentence similarity instead of retrieval?

nada5

NVIDIA org Jun 11

Thank you for asking the question. All instruction prefix examples (including clustering, STS, classification, etc) are available in Table 7 of our NV-Embed paper: https://arxiv.org/pdf/2405.17428

rmilliere

Jun 11

Thanks, I missed that in the appendix.
If anyone else is looking for this information, here are the relevant instructions:

STS: "Retrieve semantically similar text."
Clustering (adjusted for a generic task): "Identify the topic or theme of X" (e.g., "Identify the topic or theme of the given sentences" for a corpus of sentences)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment