File size: 1,043 Bytes
92f50da db9775f 92f50da db9775f 92f50da db9775f 92f50da db9775f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
# TinyBERT_L-4_H-312_v2 English Sentence Encoder
This is distilled from the `bert-base-nli-stsb-mean-tokens` pre-trained model from [Sentence-Transformers](https://sbert.net/).
The embedding vector is obtained by mean/average pooling of the last layer's hidden states.
Update 20210325: Added the attention matrices imitation objective as in the TinyBERT paper, and the distill target has been changed from `distilbert-base-nli-stsb-mean-tokens` to `bert-base-nli-stsb-mean-tokens` (they have almost the same STSb performance).
## Model Comparison
We compute cosine similarity scores of the embeddings of the sentence pair to get the spearman correlation on the STS benchmark (bigger is better):
| | Dev | Test |
| ------------------------------------ | ----- | ----- |
| bert-base-nli-stsb-mean-tokens | .8704 | .8505 |
| distilbert-base-nli-stsb-mean-tokens | .8667 | .8516 |
| TinyBERT_L-4_H-312_v2-distill-AllNLI | .8587 | .8283 |
| TinyBERT_L-4_H (20210325) | .8551 | .8341 |
|