emanjavacas commited on
Commit
3019f53
1 Parent(s): d709ce9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -1
README.md CHANGED
@@ -1 +1,20 @@
1
- Documentation In Progress...
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # MacBERTh
2
+
3
+ This model is a Historical Language Model for English coming from the [MacBERTh project](https://macberth.netlify.app/).
4
+
5
+ The architecture is based on BERT base uncased from the original BERT pre-training codebase.
6
+ The training material comes from different sources including:
7
+
8
+ - EEBO
9
+ - ECCO
10
+ - COHA
11
+ - CLMET3.1
12
+ - EVANS
13
+ - Hansard Corpus
14
+
15
+ with a total word count of approximately 3.9B tokens.
16
+
17
+ Details and evaluation can be found in the accompanying publications:
18
+ - [MacBERTh: Development and Evaluation of a Historically Pre-trained Language Model for English (1450-1950)](https://aclanthology.org/2021.nlp4dh-1.4/)
19
+ - [Adapting vs. Pre-training Language Models for Historical Languages](https://doi.org/10.46298/jdmdh.9152)
20
+