ltg
/

ltg-bert-bnc

Model card Files Files and versions Community

ltg-bert-bnc / README.md

davda54's picture

Update README.md

5137c8d about 1 year ago

|

raw history blame

No virus

2.09 kB

	---
	language:
	- en
	inference: false
	tags:
	- BERT
	- BNC-BERT
	- encoder
	license: cc-by-4.0
	---

	# BNC-BERT

	- Paper: [Trained on 100 million words and still in shape: BERT meets British National Corpus](https://arxiv.org/abs/2303.09859)
	- GitHub: [ltgoslo/ltg-bert](https://github.com/ltgoslo/ltg-bert)

	## Example usage

	This model currently needs a custom wrapper from `modeling_ltgbert.py`. Then you can use it like this:

	```python
	import torch
	from transformers import AutoTokenizer
	from modeling_ltgbert import LtgBertForMaskedLM

	tokenizer = AutoTokenizer.from_pretrained("path/to/folder")
	bert = LtgBertForMaskedLM.from_pretrained("path/to/folder")
	```

	## Please cite the following publication (just arXiv for now)
	```bibtex
	@inproceedings{samuel-etal-2023-trained,
	title = "Trained on 100 million words and still in shape: {BERT} meets {B}ritish {N}ational {C}orpus",
	author = "Samuel, David and
	Kutuzov, Andrey and
	{\O}vrelid, Lilja and
	Velldal, Erik",
	booktitle = "Findings of the Association for Computational Linguistics: EACL 2023",
	month = may,
	year = "2023",
	address = "Dubrovnik, Croatia",
	publisher = "Association for Computational Linguistics",
	url = "https://aclanthology.org/2023.findings-eacl.146",
	pages = "1954--1974",
	abstract = "While modern masked language models (LMs) are trained on ever larger corpora, we here explore the effects of down-scaling training to a modestly-sized but representative, well-balanced, and publicly available English text source {--} the British National Corpus. We show that pre-training on this carefully curated corpus can reach better performance than the original BERT model. We argue that this type of corpora has great potential as a language modeling benchmark. To showcase this potential, we present fair, reproducible and data-efficient comparative studies of LMs, in which we evaluate several training objectives and model architectures and replicate previous empirical results in a systematic way. We propose an optimized LM architecture called LTG-BERT.",
	}
	```