File size: 1,755 Bytes
eae090e
 
0ca9969
eae090e
 
 
 
e7379f1
 
eae090e
 
 
 
 
 
 
2ab7c15
eae090e
 
 
e7379f1
eae090e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
---
license: mit
language: de
---

# german-financial-statements-bert

This model is a fine-tuned version of [bert-base-german-cased](https://huggingface.co/bert-base-german-cased) using German financial statements.

It achieves the following results on the evaluation set:
- Loss: 1.2025
- Accuracy: 0.7376
- Perplexity: 3.3285

## Model description

Annual financial statements in Germany are published in the Federal Gazette and are freely accessible. The documents describe the entrepreneurial and in particular the financial situation of a company with reference to a reporting period. The german-financial-statements-bert model aims to provide a BERT model specifically for this domain.

## Training and evaluation data

The training was performed with 100,000 natural language sentences from annual financial statements. 50,000 of these sentences were taken unfiltered and randomly from 5,500 different financial statement documents, and another 50,000 were also taken randomly from 5,500 different financial statement documents, but this half was filtered so that only sentences referring to a financial entity were selected. Specifically, this means that the second half of the sentences contains an indicator for a reference to a financial entity (EUR, Euro, TEUR, €, T€). The evaluation was carried out with 20,000 sentences of the same origin and distribution.

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 3.0

### Framework versions

- Transformers 4.17.0
- Pytorch 1.10.0+cu111
- Datasets 1.18.3
- Tokenizers 0.11.6