PhilipMay commited on
Commit
5a79890
1 Parent(s): daaace7

remove old evaluation

Browse files
Files changed (1) hide show
  1. README.md +1 -21
README.md CHANGED
@@ -50,28 +50,8 @@ We evaluated all language models on GermEval18 with the F1 macro score. For each
50
 
51
  ![GermEval18 Coarse Model Evaluation for Version 2](https://raw.githubusercontent.com/German-NLP-Group/german-transformer-training/master/model_cards/model-eval-v2.png)
52
 
53
-
54
- ## Evaluation: GermEval18 Coarse
55
-
56
- | Model Name | F1 macro<br/>Mean | F1 macro<br/>Median | F1 macro<br/>Std |
57
- |---|---|---|---|
58
- | dbmdz-bert-base-german-europeana-cased | 0.727 | 0.729 | 0.00674 |
59
- | dbmdz-bert-base-german-europeana-uncased | 0.736 | 0.737 | 0.00476 |
60
- | dbmdz/electra-base-german-europeana-cased-discriminator | 0.745 | 0.745 | 0.00498 |
61
- | distilbert-base-german-cased | 0.752 | 0.752 | 0.00341 |
62
- | bert-base-german-cased | 0.762 | 0.761 | 0.00597 |
63
- | dbmdz/bert-base-german-cased | 0.765 | 0.765 | 0.00523 |
64
- | dbmdz/bert-base-german-uncased | 0.770 | 0.770 | 0.00572 |
65
- | **ELECTRA-base-german-uncased (this model)** | **0.778** | **0.778** | **0.00392** |
66
-
67
- - (1): Hyperparameters taken from the [FARM project](https://farm.deepset.ai/) "[germEval18Coarse_config.json](https://github.com/deepset-ai/FARM/blob/master/experiments/german-bert2.0-eval/germEval18Coarse_config.json)"
68
-
69
- ![GermEval18 Coarse Model Evaluation](https://raw.githubusercontent.com/German-NLP-Group/german-transformer-training/master/model_cards/model_eval.png)
70
-
71
  ## Checkpoint evaluation
72
- Since it it not guaranteed that the last checkpoint is the best, we evaluated the checkpoints on GermEval18. We found that the last checkpoint is indeed the best. The training was stable and did not overfit the text corpus. Below is a boxplot chart showing the different checkpoints.
73
-
74
- ![Checkpoint Evaluation on GermEval18](https://raw.githubusercontent.com/German-NLP-Group/german-transformer-training/master/model_cards/checkpoint_eval.png)
75
 
76
  ## Pre-training details
77
 
50
 
51
  ![GermEval18 Coarse Model Evaluation for Version 2](https://raw.githubusercontent.com/German-NLP-Group/german-transformer-training/master/model_cards/model-eval-v2.png)
52
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53
  ## Checkpoint evaluation
54
+ Since it it not guaranteed that the last checkpoint is the best, we evaluated the checkpoints on GermEval18. We found that the last checkpoint is indeed the best. The training was stable and did not overfit the text corpus.
 
 
55
 
56
  ## Pre-training details
57