tags:
- pytorch
- token-classification
- sequence-tagger-model
language: de
datasets:
- conll2003
- germeval_14
license: apache-2.0
About sbb_ner
This is a BERT model for named entity recognition (NER) in historical German.
It predicts the classes PER
, LOC
and ORG
. The model is based on the 🤗
BERT base multilingual cased
model.
We applied unsupervised pre-training on 2,333,647 pages of unlabeled historical German text from the Berlin State Library digital collections, and supervised pre-training on two datasets with contemporary German text, conll2003 and germeval_14.
For further details, have a look at sbb_ner on GitHub.
Results
In a 5-fold cross validation with different historical German NER corpora (see our KONVENS2019 paper), the model obtained an F1-Score of 84.3±1.1%.
In the CLEF-HIPE-2020 Shared Task (paper), the model ranked 2nd of 13 systems for the German coarse NER task.
Weights
We provide model weights for PyTorch.
Model | Downloads |
---|---|
bert-sbb-de-finetuned |
config.json • pytorch_model_ep7.bin • vocab.txt |