For text similarity, which pooling strategy is better?
#17
by
zhihengyu
- opened
Hi,
which pooling strategy is better for text similarity? CLS, MEAN or MAX?
CLS-strategy: like as demo, Using the output of the CLS-token
MEAN-strategy: computing the mean of all output vectors
MAX-strategy: computing a max-over-time of the output vectors.
If you want to fine-tune a model from scratch: CLS ~= MEAN > MAX
.
For bge model, you need to use cls because we fine-tune the model with cls polling.