This is distilled from the
bert-base-nli-stsb-mean-tokens pre-trained model from Sentence-Transformers.
The embedding vector is obtained by mean/average pooling of the last layer's hidden states.
Update 20210325: Added the attention matrices imitation objective as in the TinyBERT paper, and the distill target has been changed from
bert-base-nli-stsb-mean-tokens (they have almost the same STSb performance).
We compute cosine similarity scores of the embeddings of the sentence pair to get the spearman correlation on the STS benchmark (bigger is better):
- Downloads last month
Unable to determine this model’s pipeline type. Check the docs .