eli4s
/

Bert-L12-h256-A4

Inference Endpoints

Model card Files Files and versions Community

eli4s commited on Aug 17, 2021

Commit

40ba6d4

•

1 Parent(s): 094a1a2

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 This model was pretrained on the bookcorpus dataset using knowledge distillation.
-The particularity of this model is that even though it shares the same architecture as BERT, it has a hidden size of 240. Since it has 12 attention heads, the head size (20) is different from the one of the BERT base model (64).
 The knowledge distillation was performed using multiple loss functions.

 This model was pretrained on the bookcorpus dataset using knowledge distillation.
+The particularity of this model is that even though it shares the same architecture as BERT, it has a hidden size of 256. Since it has 4 attention heads, the head size is 64 just as for the BERT base model.
 The knowledge distillation was performed using multiple loss functions.