For text similarity, which pooling strategy is better?

#17
by zhihengyu - opened

Hi,

which pooling strategy is better for text similarity? CLS, MEAN or MAX?

CLS-strategy: like as demo, Using the output of the CLS-token
MEAN-strategy: computing the mean of all output vectors
MAX-strategy: computing a max-over-time of the output vectors.

Beijing Academy of Artificial Intelligence org

If you want to fine-tune a model from scratch: CLS ~= MEAN > MAX.
For bge model, you need to use cls because we fine-tune the model with cls polling.

Sign up or log in to comment