|
--- |
|
license: mit |
|
language: |
|
- en |
|
--- |
|
# MacBERTh |
|
|
|
This model is a Historical Language Model for English coming from the [MacBERTh project](https://macberth.netlify.app/). |
|
|
|
The architecture is based on BERT base uncased from the original BERT pre-training codebase. |
|
The training material comes from different sources including: |
|
|
|
- EEBO |
|
- ECCO |
|
- COHA |
|
- CLMET3.1 |
|
- EVANS |
|
- Hansard Corpus |
|
|
|
with a total word count of approximately 3.9B tokens. |
|
|
|
Details and evaluation can be found in the accompanying publications: |
|
- [MacBERTh: Development and Evaluation of a Historically Pre-trained Language Model for English (1450-1950)](https://aclanthology.org/2021.nlp4dh-1.4/) |
|
- [Adapting vs. Pre-training Language Models for Historical Languages](https://doi.org/10.46298/jdmdh.9152) |