emanjavacas
/

MacBERTh

Inference Endpoints

Model card Files Files and versions Community

emanjavacas commited on Oct 31, 2023

Commit

3019f53

•

1 Parent(s): d709ce9

Update README.md

Files changed (1) hide show

README.md +20 -1

README.md CHANGED Viewed

	@@ -1 +1,20 @@
1	- ~~Documentation~~ ~~In Progress...~~

+# MacBERTh
+This model is a Historical Language Model for English coming from the [MacBERTh project](https://macberth.netlify.app/).
+The architecture is based on BERT base uncased from the original BERT pre-training codebase.
+The training material comes from different sources including:
+- EEBO
+- ECCO
+- COHA
+- CLMET3.1
+- EVANS
+- Hansard Corpus
+with a total word count of approximately 3.9B tokens.
+Details and evaluation can be found in the accompanying publications:
+- [MacBERTh: Development and Evaluation of a Historically Pre-trained Language Model for English (1450-1950)](https://aclanthology.org/2021.nlp4dh-1.4/)
+- [Adapting vs. Pre-training Language Models for Historical Languages](https://doi.org/10.46298/jdmdh.9152)