gaunernst
/

bert-L2-H768-uncased

Inference Endpoints

Model card Files Files and versions Community

bert-L2-H768-uncased / README.md

gaunernst's picture

Update README.md

0d85b3c 7 months ago

|

raw history blame contribute delete

No virus

3.03 kB

	---
	license: apache-2.0
	datasets:
	- bookcorpus
	- wikipedia
	language:
	- en
	---

	# BERT L2-H768 (uncased)

	Mini BERT models from https://arxiv.org/abs/1908.08962 that the HF team didn't convert. The original [conversion script](https://github.com/huggingface/transformers/blob/main/src/transformers/models/bert/convert_bert_original_tf_checkpoint_to_pytorch.py) is used.

	See the original Google repo: [google-research/bert](https://github.com/google-research/bert)

	Note: it's not clear if these checkpoints have undergone knowledge distillation.

	## Model variants

	\| \|H=128\|H=256\|H=512\|H=768\|
	\|---\|:---:\|:---:\|:---:\|:---:\|
	\| L=2 \|[2/128 (BERT-Tiny)][2_128]\|[2/256][2_256]\|[2/512][2_512]\|[2/768][2_768]\|
	\| L=4 \|[4/128][4_128]\|[4/256 (BERT-Mini)][4_256]\|[4/512 (BERT-Small)][4_512]\|[4/768][4_768]\|
	\| L=6 \|[6/128][6_128]\|[6/256][6_256]\|[6/512][6_512]\|[6/768][6_768]\|
	\| L=8 \|[8/128][8_128]\|[8/256][8_256]\|[8/512 (BERT-Medium)][8_512]\|[8/768][8_768]\|
	\| L=10 \|[10/128][10_128]\|[10/256][10_256]\|[10/512][10_512]\|[10/768][10_768]\|
	\| L=12 \|[12/128][12_128]\|[12/256][12_256]\|[12/512][12_512]\|[12/768 (BERT-Base, original)][12_768]\|

	[2_128]: https://huggingface.co/gaunernst/bert-tiny-uncased
	[2_256]: https://huggingface.co/gaunernst/bert-L2-H256-uncased
	[2_512]: https://huggingface.co/gaunernst/bert-L2-H512-uncased
	[2_768]: https://huggingface.co/gaunernst/bert-L2-H768-uncased
	[4_128]: https://huggingface.co/gaunernst/bert-L4-H128-uncased
	[4_256]: https://huggingface.co/gaunernst/bert-mini-uncased
	[4_512]: https://huggingface.co/gaunernst/bert-small-uncased
	[4_768]: https://huggingface.co/gaunernst/bert-L4-H768-uncased
	[6_128]: https://huggingface.co/gaunernst/bert-L6-H128-uncased
	[6_256]: https://huggingface.co/gaunernst/bert-L6-H256-uncased
	[6_512]: https://huggingface.co/gaunernst/bert-L6-H512-uncased
	[6_768]: https://huggingface.co/gaunernst/bert-L6-H768-uncased
	[8_128]: https://huggingface.co/gaunernst/bert-L8-H128-uncased
	[8_256]: https://huggingface.co/gaunernst/bert-L8-H256-uncased
	[8_512]: https://huggingface.co/gaunernst/bert-medium-uncased
	[8_768]: https://huggingface.co/gaunernst/bert-L8-H768-uncased
	[10_128]: https://huggingface.co/gaunernst/bert-L10-H128-uncased
	[10_256]: https://huggingface.co/gaunernst/bert-L10-H256-uncased
	[10_512]: https://huggingface.co/gaunernst/bert-L10-H512-uncased
	[10_768]: https://huggingface.co/gaunernst/bert-L10-H768-uncased
	[12_128]: https://huggingface.co/gaunernst/bert-L12-H128-uncased
	[12_256]: https://huggingface.co/gaunernst/bert-L12-H256-uncased
	[12_512]: https://huggingface.co/gaunernst/bert-L12-H512-uncased
	[12_768]: https://huggingface.co/bert-base-uncased

	## Usage

	See other BERT model cards e.g. https://huggingface.co/bert-base-uncased

	## Citation

	```bibtex
	@article{turc2019,
	title={Well-Read Students Learn Better: On the Importance of Pre-training Compact Models},
	author={Turc, Iulia and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina},
	journal={arXiv preprint arXiv:1908.08962v2 },
	year={2019}
	}
	```