MacBERTh / README.md
emanjavacas's picture
Update README.md
dfa93d9
---
license: mit
language:
- en
---
# MacBERTh
This model is a Historical Language Model for English coming from the [MacBERTh project](https://macberth.netlify.app/).
The architecture is based on BERT base uncased from the original BERT pre-training codebase.
The training material comes from different sources including:
- EEBO
- ECCO
- COHA
- CLMET3.1
- EVANS
- Hansard Corpus
with a total word count of approximately 3.9B tokens.
Details and evaluation can be found in the accompanying publications:
- [MacBERTh: Development and Evaluation of a Historically Pre-trained Language Model for English (1450-1950)](https://aclanthology.org/2021.nlp4dh-1.4/)
- [Adapting vs. Pre-training Language Models for Historical Languages](https://doi.org/10.46298/jdmdh.9152)