SpanMarker for GermEval 2014 NER

This is a SpanMarker model that was fine-tuned on the GermEval 2014 NER Dataset.

The GermEval 2014 NER Shared Task builds on a new dataset with German Named Entity annotation with the following properties: The data was sampled from German Wikipedia and News Corpora as a collection of citations. The dataset covers over 31,000 sentences corresponding to over 590,000 tokens. The NER annotation uses the NoSta-D guidelines, which extend the Tübingen Treebank guidelines, using four main NER categories with sub-structure, and annotating embeddings among NEs such as [ORG FC Kickers [LOC Darmstadt]].

12 classes of Named Entites are annotated and must be recognized: four main classes PERson, LOCation, ORGanisation, and OTHer and their subclasses by introducing two fine-grained labels: -deriv marks derivations from NEs such as "englisch" (“English”), and -part marks compounds including a NE as a subsequence deutschlandweit (“Germany-wide”).

Fine-Tuning

We use the same hyper-parameters as used in the "German's Next Language Model" paper using the GWLMS TEAMS model as backbone.

Evaluation is performed with SpanMarkers internal evaluation code that uses seqeval.

We fine-tune 5 models and upload the model with best F1-Score on development set. Results on development set are in brackets:

Model	Run 1	Run 2	Run 3	Run 4	Run 5	Avg.
GWLMS TEAMS	(88.76) / 87.85	(88.54) / 87.77	(88.41) / 87.98	(88.86) / 87.81	(88.83) / 88.50	(88.68) / 87.98

The best model achieves a final test score of 87.81%.

Scripts for training and evaluation are also available.

Usage

The fine-tuned model can be used like:

from span_marker import SpanMarkerModel

# Download from the 🤗 Hub
model = SpanMarkerModel.from_pretrained("gwlms/span-marker-teams-germeval14")

# Run inference
entities = model.predict("Jürgen Schmidhuber studierte ab 1983 Informatik und Mathematik an der TU München .")

gwlms
/

span-marker-teams-germeval14

SpanMarker for GermEval 2014 NER

Fine-Tuning

Usage

Model tree for gwlms/span-marker-teams-germeval14

Dataset used to train gwlms/span-marker-teams-germeval14

Evaluation results