Turmbücher NER

A model for historical German developed by Ismail Prada Ziegler as part of a research project at the University of Bern, Digital Humanities.

Performance

	PER	ORG	LOC	Micro-Avg
Precision	82.46%	28.81%	88.51%	81.21%
Recall	88.51%	44.74%	83.02%	83.99%
F1-Score	85.38%	35.05%	85.67%	82.57%

Note: ORG-tags were too inconsistent in the training data and performed poorly.

We discovered in first experiments that the model also performs reasonably well on automatically transcribed text (CER of around 5%).

Main data set: Berner Turmbücher, early volumes from 16th C., Early New High German, 61k tokens training data.

Secondary data sets:

This project is still in progress.