TinyBERT_L-4_H-312_v2 English Sentence Encoder

This is distilled from the bert-base-nli-stsb-mean-tokens pre-trained model from Sentence-Transformers.

The embedding vector is obtained by mean/average pooling of the last layer's hidden states.

Update 20210325: Added the attention matrices imitation objective as in the TinyBERT paper, and the distill target has been changed from distilbert-base-nli-stsb-mean-tokens to bert-base-nli-stsb-mean-tokens (they have almost the same STSb performance).

Model Comparison

We compute cosine similarity scores of the embeddings of the sentence pair to get the spearman correlation on the STS benchmark (bigger is better):

	Dev	Test
bert-base-nli-stsb-mean-tokens	.8704	.8505
distilbert-base-nli-stsb-mean-tokens	.8667	.8516
TinyBERT_L-4_H-312_v2-distill-AllNLI	.8587	.8283
TinyBERT_L-4_H (20210325)	.8551	.8341