# TinyBERT_L-4_H-312_v2 English Sentence Encoder

This is distilled from the `bert-base-nli-stsb-mean-tokens` pre-trained model from [Sentence-Transformers](https://sbert.net/).

The embedding vector is obtained by mean/average pooling of the last layer's hidden states.

Update 20210325: Added the attention matrices imitation objective as in the TinyBERT paper, and the distill target has been changed from `distilbert-base-nli-stsb-mean-tokens` to `bert-base-nli-stsb-mean-tokens` (they have almost the same STSb performance).

## Model Comparison

We compute cosine similarity scores of the embeddings of the sentence pair to get the spearman correlation on the STS benchmark (bigger is better):

|                                      | Dev   | Test  |
| ------------------------------------ | ----- | ----- |
| bert-base-nli-stsb-mean-tokens       | .8704 | .8505 |
| distilbert-base-nli-stsb-mean-tokens | .8667 | .8516 |
| TinyBERT_L-4_H-312_v2-distill-AllNLI | .8587 | .8283 |
| TinyBERT_L-4_H (20210325)            | .8551 | .8341 |