Model description


Trained on fictional and non-fictional German texts written between 1840 and 1920:

Hardware used

1 Tesla P4 GPU


Parameter Value
Epochs 3
Gradient_accumulation_steps 1
Train_batch_size 32
Learning_rate 0.00003
Max_seq_len 128

Evaluation results: Automatic tagging of four forms of speech/thought/writing representation in historical fictional and non-fictional German texts

The language model was used in the task to tag direct, indirect, reported and free indirect speech/thought/writing representation in fictional and non-fictional German texts. The tagger is available and described in detail at

The tagging model was trained using the SequenceTagger Class of the Flair framework (Akbik et al., 2019) which implements a BiLSTM-CRF architecture on top of a language embedding (as proposed by Huang et al. (2015)).


Parameter Value
Hidden_size 256
Learning_rate 0.1
Mini_batch_size 8
Max_epochs 150

Results are reported below in comparison to a custom trained flair embedding, which was stacked onto a custom trained fastText-model. Both models were trained on the same dataset.

BERT FastText+Flair Test data
F1 Precision Recall F1 Precision Recall
Direct 0.80 0.86 0.74 0.84 0.90 0.79 historical German, fictional & non-fictional
Indirect 0.76 0.79 0.73 0.73 0.78 0.68 historical German, fictional & non-fictional
Reported 0.58 0.69 0.51 0.56 0.68 0.48 historical German, fictional & non-fictional
Free indirect 0.57 0.80 0.44 0.47 0.78 0.34 modern German, fictional

Intended use:

Historical German Texts (1840 to 1920)

(Showed good performance with modern German fictional texts as well)

Downloads last month
Hosted inference API

Mask token: [MASK]