Ceshine Lee commited on
Commit
db9775f
1 Parent(s): 92f50da

New version (distilbert-base to bert-base; attention matrices imitation)

Browse files
Files changed (2) hide show
  1. README.md +5 -1
  2. pytorch_model.bin +2 -2
README.md CHANGED
@@ -1,14 +1,18 @@
1
  # TinyBERT_L-4_H-312_v2 English Sentence Encoder
2
 
3
- This is distilled from the `distilbert-base-nli-stsb-mean-tokens` pre-trained model from [Sentence-Transformers](https://sbert.net/).
4
 
5
  The embedding vector is obtained by mean/average pooling of the last layer's hidden states.
6
 
 
 
7
  ## Model Comparison
8
 
9
  We compute cosine similarity scores of the embeddings of the sentence pair to get the spearman correlation on the STS benchmark (bigger is better):
10
 
11
  | | Dev | Test |
12
  | ------------------------------------ | ----- | ----- |
 
13
  | distilbert-base-nli-stsb-mean-tokens | .8667 | .8516 |
14
  | TinyBERT_L-4_H-312_v2-distill-AllNLI | .8587 | .8283 |
 
 
1
  # TinyBERT_L-4_H-312_v2 English Sentence Encoder
2
 
3
+ This is distilled from the `bert-base-nli-stsb-mean-tokens` pre-trained model from [Sentence-Transformers](https://sbert.net/).
4
 
5
  The embedding vector is obtained by mean/average pooling of the last layer's hidden states.
6
 
7
+ Update 20210325: Added the attention matrices imitation objective as in the TinyBERT paper, and the distill target has been changed from `distilbert-base-nli-stsb-mean-tokens` to `bert-base-nli-stsb-mean-tokens` (they have almost the same STSb performance).
8
+
9
  ## Model Comparison
10
 
11
  We compute cosine similarity scores of the embeddings of the sentence pair to get the spearman correlation on the STS benchmark (bigger is better):
12
 
13
  | | Dev | Test |
14
  | ------------------------------------ | ----- | ----- |
15
+ | bert-base-nli-stsb-mean-tokens | .8704 | .8505 |
16
  | distilbert-base-nli-stsb-mean-tokens | .8667 | .8516 |
17
  | TinyBERT_L-4_H-312_v2-distill-AllNLI | .8587 | .8283 |
18
+ | TinyBERT_L-4_H (20210325) | .8551 | .8341 |
pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:55cf95c30a8ab86ae85cb790b49781ff8398d664fd7be6f5815ce5aa0e0dd5f6
3
- size 57432312
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6cab2a4a2fb96d79e17a634bf921aeddd6fe297287ff6ba6464dee261f7e6226
3
+ size 57432311