Edit model card

Turmbücher NER

A model for historical German developed by Ismail Prada Ziegler as part of a research project at the University of Bern, Digital Humanities.

Performance

PER ORG LOC Micro-Avg
Precision 82.46% 28.81% 88.51% 81.21%
Recall 88.51% 44.74% 83.02% 83.99%
F1-Score 85.38% 35.05% 85.67% 82.57%

Note: ORG-tags were too inconsistent in the training data and performed poorly.

We discovered in first experiments that the model also performs reasonably well on automatically transcribed text (CER of around 5%).

Data Set

Main data set: Berner Turmbücher, early volumes from 16th C., Early New High German, 61k tokens training data.

Secondary data sets:

  • SSRQ - Fribourg, language model + tagging, 59k tokens.
  • Chorgerichtsmanuale (unpublished), language model + tagging, 76k tokens.
  • Königsfelden Charters, language model, 623k tokens.
  • Talgerichtsprotokolle (unpublished), language model, 438k tokens.

Notice

This project is still in progress.

Downloads last month
75