SBB
/

Token Classification
Transformers
PyTorch
German
bert
sequence-tagger-model
Inference Endpoints
sbb_ner / README.md
cneud's picture
Update README.md
8069f07
|
raw
history blame
1.71 kB
metadata
tags:
  - pytorch
  - token-classification
  - sequence-tagger-model
language: de
datasets:
  - conll2003
  - germeval_14
license: apache-2.0

About sbb_ner

This is a BERT model for named entity recognition (NER) in historical German. It predicts the classes PER, LOC and ORG. The model is based on the 🤗 BERT base multilingual cased model.

We applied unsupervised pre-training on 2,333,647 pages of unlabeled historical German text from the Berlin State Library digital collections, and supervised pre-training on two datasets with contemporary German text, conll2003 and germeval_14.

For further details, have a look at sbb_ner on GitHub.

Results

In a 5-fold cross validation with different historical German NER corpora (see our KONVENS2019 paper), the model obtained an F1-Score of 84.3±1.1%.

In the CLEF-HIPE-2020 Shared Task (paper), the model ranked 2nd of 13 systems for the German coarse NER task.

Weights

We provide model weights for PyTorch.

Model Downloads
bert-sbb-de-finetuned config.jsonpytorch_model_ep7.binvocab.txt