Edit model card

german-financial-statements-bert

This model is a fine-tuned version of bert-base-german-cased using German financial statements.

It achieves the following results on the evaluation set:

  • Loss: 1.2025
  • Accuracy: 0.7376
  • Perplexity: 3.3285

Model description

Annual financial statements in Germany are published in the Federal Gazette and are freely accessible. The documents describe the entrepreneurial and in particular the financial situation of a company with reference to a reporting period. The german-financial-statements-bert model aims to provide a BERT model specifically for this domain.

Training and evaluation data

The training was performed with 100,000 natural language sentences from annual financial statements. 50,000 of these sentences were taken unfiltered and randomly from 5,500 different financial statement documents, and another 50,000 were also taken randomly from 5,500 different financial statement documents, but this half was filtered so that only sentences referring to a financial entity were selected. Specifically, this means that the second half of the sentences contains an indicator for a reference to a financial entity (EUR, Euro, TEUR, €, T€). The evaluation was carried out with 20,000 sentences of the same origin and distribution.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 3.0

Framework versions

  • Transformers 4.17.0
  • Pytorch 1.10.0+cu111
  • Datasets 1.18.3
  • Tokenizers 0.11.6
Downloads last month
3