Historisches Grundbuch der Stadt Basel Nested NER

Tagger for Historical Texts, mainly 15th to 18th century German.

Developed by Ismail Prada Ziegler.

A model for historical German developed as part of the project Economies of Space. Practices, Discourses, and Actors on the Basel Real Estate Market (1400-1700) at the University of Basel and the Digital Humanities Bern. This Model was created to annotate nested document structures. It can be used to annotate flat text (such as in the example), but may perform slightly worse than models trained only for that task. You can annotate nested tags by using this script. You can find more info on this model here.

Performance

When annotating recursively:

PER ORG LOC
Precision 86.30% 82.69% 82.79%
Recall 85.82% 74.14% 78.46%
F1-Score 86.06% 78.18% 80.57%

Dataset

Not yet published dataset created from the Historical Land Registry of the city of Basel. Timeframe: 1400-1700. Language: Early New High German. 661 documents in train, 83 in dev. Language model based on the full HLRB corpus until 1800, appr. 120k documents.

The documents were annotated according to the BeNASch annotation guidelines. For this model, a simplified tagset was used.

The training data was prepared in a special way to accommodate nested annotation. See the linked paper for more information.

Citation

If you publish works using this model, please cite:

Prada Ziegler, I. (2024, May 30). What's in an entity? Exploring Nested Named Entity Recognition in the Historical Land Register of Basel (1400-1700). DH Benelux 2024, Leuven, Belgium. Zenodo. https://doi.org/10.5281/zenodo.11394453

Downloads last month
47
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support