eli4s commited on
Commit
40ba6d4
1 Parent(s): 094a1a2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -1,6 +1,6 @@
1
  This model was pretrained on the bookcorpus dataset using knowledge distillation.
2
 
3
- The particularity of this model is that even though it shares the same architecture as BERT, it has a hidden size of 240. Since it has 12 attention heads, the head size (20) is different from the one of the BERT base model (64).
4
 
5
  The knowledge distillation was performed using multiple loss functions.
6
 
 
1
  This model was pretrained on the bookcorpus dataset using knowledge distillation.
2
 
3
+ The particularity of this model is that even though it shares the same architecture as BERT, it has a hidden size of 256. Since it has 4 attention heads, the head size is 64 just as for the BERT base model.
4
 
5
  The knowledge distillation was performed using multiple loss functions.
6