remove old evaluation
Browse files
README.md
CHANGED
@@ -50,28 +50,8 @@ We evaluated all language models on GermEval18 with the F1 macro score. For each
|
|
50 |
|
51 |
![GermEval18 Coarse Model Evaluation for Version 2](https://raw.githubusercontent.com/German-NLP-Group/german-transformer-training/master/model_cards/model-eval-v2.png)
|
52 |
|
53 |
-
|
54 |
-
## Evaluation: GermEval18 Coarse
|
55 |
-
|
56 |
-
| Model Name | F1 macro<br/>Mean | F1 macro<br/>Median | F1 macro<br/>Std |
|
57 |
-
|---|---|---|---|
|
58 |
-
| dbmdz-bert-base-german-europeana-cased | 0.727 | 0.729 | 0.00674 |
|
59 |
-
| dbmdz-bert-base-german-europeana-uncased | 0.736 | 0.737 | 0.00476 |
|
60 |
-
| dbmdz/electra-base-german-europeana-cased-discriminator | 0.745 | 0.745 | 0.00498 |
|
61 |
-
| distilbert-base-german-cased | 0.752 | 0.752 | 0.00341 |
|
62 |
-
| bert-base-german-cased | 0.762 | 0.761 | 0.00597 |
|
63 |
-
| dbmdz/bert-base-german-cased | 0.765 | 0.765 | 0.00523 |
|
64 |
-
| dbmdz/bert-base-german-uncased | 0.770 | 0.770 | 0.00572 |
|
65 |
-
| **ELECTRA-base-german-uncased (this model)** | **0.778** | **0.778** | **0.00392** |
|
66 |
-
|
67 |
-
- (1): Hyperparameters taken from the [FARM project](https://farm.deepset.ai/) "[germEval18Coarse_config.json](https://github.com/deepset-ai/FARM/blob/master/experiments/german-bert2.0-eval/germEval18Coarse_config.json)"
|
68 |
-
|
69 |
-
![GermEval18 Coarse Model Evaluation](https://raw.githubusercontent.com/German-NLP-Group/german-transformer-training/master/model_cards/model_eval.png)
|
70 |
-
|
71 |
## Checkpoint evaluation
|
72 |
-
Since it it not guaranteed that the last checkpoint is the best, we evaluated the checkpoints on GermEval18. We found that the last checkpoint is indeed the best. The training was stable and did not overfit the text corpus.
|
73 |
-
|
74 |
-
![Checkpoint Evaluation on GermEval18](https://raw.githubusercontent.com/German-NLP-Group/german-transformer-training/master/model_cards/checkpoint_eval.png)
|
75 |
|
76 |
## Pre-training details
|
77 |
|
|
|
50 |
|
51 |
![GermEval18 Coarse Model Evaluation for Version 2](https://raw.githubusercontent.com/German-NLP-Group/german-transformer-training/master/model_cards/model-eval-v2.png)
|
52 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
53 |
## Checkpoint evaluation
|
54 |
+
Since it it not guaranteed that the last checkpoint is the best, we evaluated the checkpoints on GermEval18. We found that the last checkpoint is indeed the best. The training was stable and did not overfit the text corpus.
|
|
|
|
|
55 |
|
56 |
## Pre-training details
|
57 |
|