eli4s commited on
Commit
28e6b71
1 Parent(s): b40abcb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -1,7 +1,9 @@
1
  This model was pretrained on the bookcorpus dataset using knowledge distillation.
2
 
3
  The particularity of this model is that even though it shares the same architecture as BERT, it has a hidden size of 384 (half the hidden size of BERT) and 6 attention heads (hence the same head size of BERT).
 
4
  The weights of the model were initialized by pruning the weights of bert-base-uncased.
 
5
  A knowledge distillation was performed using multiple loss functions to fine-tune the model.
6
 
7
  PS : the tokenizer is the same as the one of the model bert-base-uncased.
 
1
  This model was pretrained on the bookcorpus dataset using knowledge distillation.
2
 
3
  The particularity of this model is that even though it shares the same architecture as BERT, it has a hidden size of 384 (half the hidden size of BERT) and 6 attention heads (hence the same head size of BERT).
4
+
5
  The weights of the model were initialized by pruning the weights of bert-base-uncased.
6
+
7
  A knowledge distillation was performed using multiple loss functions to fine-tune the model.
8
 
9
  PS : the tokenizer is the same as the one of the model bert-base-uncased.