gaunernst's picture
Update README.md
98e3346
|
raw
history blame
3.03 kB
metadata
license: apache-2.0
datasets:
  - bookcorpus
  - wikipedia
language:
  - en

BERT L8-H256 (uncased)

Mini BERT models from https://arxiv.org/abs/1908.08962 that the HF team didn't convert. The original conversion script is used.

See the original Google repo: google-research/bert

Note: it's not clear if these checkpoints have undergone knowledge distillation.

Model variants

Usage

See other BERT model cards e.g. https://huggingface.co/bert-base-uncased

Citation

@article{turc2019,
  title={Well-Read Students Learn Better: On the Importance of Pre-training Compact Models},
  author={Turc, Iulia and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina},
  journal={arXiv preprint arXiv:1908.08962v2 },
  year={2019}
}