Ceshine Lee
commited on
Commit
•
db9775f
1
Parent(s):
92f50da
New version (distilbert-base to bert-base; attention matrices imitation)
Browse files- README.md +5 -1
- pytorch_model.bin +2 -2
README.md
CHANGED
@@ -1,14 +1,18 @@
|
|
1 |
# TinyBERT_L-4_H-312_v2 English Sentence Encoder
|
2 |
|
3 |
-
This is distilled from the `
|
4 |
|
5 |
The embedding vector is obtained by mean/average pooling of the last layer's hidden states.
|
6 |
|
|
|
|
|
7 |
## Model Comparison
|
8 |
|
9 |
We compute cosine similarity scores of the embeddings of the sentence pair to get the spearman correlation on the STS benchmark (bigger is better):
|
10 |
|
11 |
| | Dev | Test |
|
12 |
| ------------------------------------ | ----- | ----- |
|
|
|
13 |
| distilbert-base-nli-stsb-mean-tokens | .8667 | .8516 |
|
14 |
| TinyBERT_L-4_H-312_v2-distill-AllNLI | .8587 | .8283 |
|
|
|
|
1 |
# TinyBERT_L-4_H-312_v2 English Sentence Encoder
|
2 |
|
3 |
+
This is distilled from the `bert-base-nli-stsb-mean-tokens` pre-trained model from [Sentence-Transformers](https://sbert.net/).
|
4 |
|
5 |
The embedding vector is obtained by mean/average pooling of the last layer's hidden states.
|
6 |
|
7 |
+
Update 20210325: Added the attention matrices imitation objective as in the TinyBERT paper, and the distill target has been changed from `distilbert-base-nli-stsb-mean-tokens` to `bert-base-nli-stsb-mean-tokens` (they have almost the same STSb performance).
|
8 |
+
|
9 |
## Model Comparison
|
10 |
|
11 |
We compute cosine similarity scores of the embeddings of the sentence pair to get the spearman correlation on the STS benchmark (bigger is better):
|
12 |
|
13 |
| | Dev | Test |
|
14 |
| ------------------------------------ | ----- | ----- |
|
15 |
+
| bert-base-nli-stsb-mean-tokens | .8704 | .8505 |
|
16 |
| distilbert-base-nli-stsb-mean-tokens | .8667 | .8516 |
|
17 |
| TinyBERT_L-4_H-312_v2-distill-AllNLI | .8587 | .8283 |
|
18 |
+
| TinyBERT_L-4_H (20210325) | .8551 | .8341 |
|
pytorch_model.bin
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:6cab2a4a2fb96d79e17a634bf921aeddd6fe297287ff6ba6464dee261f7e6226
|
3 |
+
size 57432311
|