metadata
license: apache-2.0
datasets:
- bookcorpus
- wikipedia
language:
- en
BERT L8-H256 (uncased)
Mini BERT models from https://arxiv.org/abs/1908.08962 that the HF team didn't convert. The original conversion script is used.
See the original Google repo: google-research/bert
Note: it's not clear if these checkpoints have undergone knowledge distillation.
Model variants
H=128 | H=256 | H=512 | H=768 | |
---|---|---|---|---|
L=2 | 2/128 (BERT-Tiny) | 2/256 | 2/512 | 2/768 |
L=4 | 4/128 | 4/256 (BERT-Mini) | 4/512 (BERT-Small) | 4/768 |
L=6 | 6/128 | 6/256 | 6/512 | 6/768 |
L=8 | 8/128 | 8/256 | 8/512 (BERT-Medium) | 8/768 |
L=10 | 10/128 | 10/256 | 10/512 | 10/768 |
L=12 | 12/128 | 12/256 | 12/512 | 12/768 (BERT-Base, original) |
Usage
See other BERT model cards e.g. https://huggingface.co/bert-base-uncased
Citation
@article{turc2019,
title={Well-Read Students Learn Better: On the Importance of Pre-training Compact Models},
author={Turc, Iulia and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina},
journal={arXiv preprint arXiv:1908.08962v2 },
year={2019}
}