metadata

license: apache-2.0
datasets:
  - bookcorpus
  - wikipedia
language:
  - en

BERT L8-H256 (uncased)

Mini BERT models from https://arxiv.org/abs/1908.08962 that the HF team didn't convert. The original conversion script is used.

See the original Google repo: google-research/bert

Note: it's not clear if these checkpoints have undergone knowledge distillation.

Model variants

	H=128	H=256	H=512	H=768
L=2	2/128 (BERT-Tiny)	2/256	2/512	2/768
L=4	4/128	4/256 (BERT-Mini)	4/512 (BERT-Small)	4/768
L=6	6/128	6/256	6/512	6/768
L=8	8/128	8/256	8/512 (BERT-Medium)	8/768
L=10	10/128	10/256	10/512	10/768
L=12	12/128	12/256	12/512	12/768 (BERT-Base, original)

Usage

See other BERT model cards e.g. https://huggingface.co/bert-base-uncased

Citation

@article{turc2019,
  title={Well-Read Students Learn Better: On the Importance of Pre-training Compact Models},
  author={Turc, Iulia and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina},
  journal={arXiv preprint arXiv:1908.08962v2 },
  year={2019}
}