Turmbücher NER

A model for historical German developed by Ismail Prada Ziegler as part of a research project at the University of Bern, Digital Humanities.

Performance

PER ORG LOC Micro-Avg
Precision 82.46% 28.81% 88.51% 81.21%
Recall 88.51% 44.74% 83.02% 83.99%
F1-Score 85.38% 35.05% 85.67% 82.57%

Note: ORG-tags were too inconsistent in the training data and performed poorly.

We discovered in first experiments that the model also performs reasonably well on automatically transcribed text (CER of around 5%).

Data Set

Main data set: Berner Turmbücher, early volumes from 16th C., Early New High German, 61k tokens training data.

Secondary data sets:

  • SSRQ - Fribourg, language model + tagging, 59k tokens.
  • Chorgerichtsmanuale (unpublished), language model + tagging, 76k tokens.
  • Königsfelden Charters, language model, 623k tokens.
  • Talgerichtsprotokolle (unpublished), language model, 438k tokens.

Notice

This project is still in progress.

Downloads last month
5
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.