Back to all models
fill-mask mask_token: [MASK]
Query this model
🔥 This model is currently loaded and running on the Inference API. ⚠️ This model could not be loaded by the inference API. ⚠️ This model can be loaded on the Inference API on-demand.
JSON Output
API endpoint
								curl -X POST \
-H "Authorization: Bearer YOUR_ORG_OR_USER_API_TOKEN" \
-H "Content-Type: application/json" \
-d '"json encoded string"' \
Share Copied link to clipboard

Monthly model downloads

redewiedergabe/bert-base-historical-german-rw-cased redewiedergabe/bert-base-historical-german-rw-cased
last 30 days



Contributed by

redewiedergabe Redewiedergabe
1 model

How to use this model directly from the 🤗/transformers library:

Copy to clipboard
from transformers import AutoTokenizer, AutoModelWithLMHead tokenizer = AutoTokenizer.from_pretrained("redewiedergabe/bert-base-historical-german-rw-cased") model = AutoModelWithLMHead.from_pretrained("redewiedergabe/bert-base-historical-german-rw-cased")

Model description


Trained on fictional and non-fictional German texts written between 1840 and 1920:

Hardware used

1 Tesla P4 GPU


Parameter Value
Epochs 3
Gradient_accumulation_steps 1
Train_batch_size 32
Learning_rate 0.00003
Max_seq_len 128

Evaluation results: Automatic tagging of four forms of speech/thought/writing representation in historical fictional and non-fictional German texts

The language model was used in the task to tag direct, indirect, reported and free indirect speech/thought/writing representation in fictional and non-fictional German texts. The tagger is available and described in detail at

The tagging model was trained using the SequenceTagger Class of the Flair framework (Akbik et al., 2019) which implements a BiLSTM-CRF architecture on top of a language embedding (as proposed by Huang et al. (2015)).


Parameter Value
Hidden_size 256
Learning_rate 0.1
Mini_batch_size 8
Max_epochs 150

Results are reported below in comparison to a custom trained flair embedding, which was stacked onto a custom trained fastText-model. Both models were trained on the same dataset.

BERT FastText+Flair Test data
F1 Precision Recall F1 Precision Recall
Direct 0.80 0.86 0.74 0.84 0.90 0.79 historical German, fictional & non-fictional
Indirect 0.76 0.79 0.73 0.73 0.78 0.68 historical German, fictional & non-fictional
Reported 0.58 0.69 0.51 0.56 0.68 0.48 historical German, fictional & non-fictional
Free indirect 0.57 0.80 0.44 0.47 0.78 0.34 modern German, fictional

Intended use:

Historical German Texts (1840 to 1920)

(Showed good performance with modern German fictional texts as well)