readme: add benchmarks on NER datasets
Browse files
README.md
CHANGED
@@ -26,9 +26,47 @@ Version 1 of the Zeitungs-LM was pretrained on the following publicly available
|
|
26 |
|
27 |
In total, the pretraining corpus has a size of 133GB.
|
28 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
29 |
# Changelog
|
30 |
|
31 |
-
* 02.10.2024: Initial version of the model. More details
|
32 |
|
33 |
# Acknowledgements
|
34 |
|
|
|
26 |
|
27 |
In total, the pretraining corpus has a size of 133GB.
|
28 |
|
29 |
+
## Benchmarks (Named Entity Recognition)
|
30 |
+
|
31 |
+
We compare our Zeitungs-LM directly to the Europeana BERT model (as Zeitungs-LM is supposed to be the successor of it) on various downstream tasks from the [hmBench](https://github.com/stefan-it/hmBench) repository, which is focussed on Named Entity Recognition.
|
32 |
+
|
33 |
+
Additionally, we use two additional datasets (ONB and LFT) from the ["A Named Entity Recognition Shootout for German"](https://aclanthology.org/P18-2020/) paper.
|
34 |
+
|
35 |
+
We report averaged micro F1-Score over 5 runs with different seeds and use the best hyper-parameter configuration on the development set of each dataset to report the final test score.
|
36 |
+
|
37 |
+
### Development Set
|
38 |
+
|
39 |
+
The results on the development set can be seen in the following table:
|
40 |
+
|
41 |
+
| Model \ Dataset | [LFT][1] | [ONB][2] | [HisGermaNER][3] | [HIPE-2020][4] | [NewsEye][5] | [AjMC][6] | Avg. |
|
42 |
+
|:--------------------|:---------|:---------|:-----------------|:---------------|:-------------|:----------|:----------|
|
43 |
+
| [Europeana BERT][7] | 79.22 | 88.20 | 81.41 | 80.92 | 41.65 | 87.91 | 76.55 |
|
44 |
+
| Zeitungs-LM v1 | 79.39 | 88.53 | 83.10 | 81.55 | 44.53 | 89.71 | **77.80** |
|
45 |
+
|
46 |
+
Our Zeitungs-LM leads to a performance boost of 1.25% compared to the German Europeana BERT model.
|
47 |
+
|
48 |
+
### Test Set
|
49 |
+
|
50 |
+
The final results on the test set can be seen here:
|
51 |
+
|
52 |
+
| Model \ Dataset | [LFT][1] | [ONB][2] | [HisGermaNER][3] | [HIPE-2020][4] | [NewsEye][5] | [AjMC][6] | Avg.
|
53 |
+
|:--------------------|:---------|:---------|:-----------------|:---------------|:-------------|:----------|:---------|
|
54 |
+
| [Europeana BERT][7] | 80.43 | 84.39 | 83.21 | 77.49 | 42.96 | 90.52 | 76.50 |
|
55 |
+
| Zeitungs-LM v1 | 80.35 | 87.28 | 84.92 | 79.91 | 47.16 | 92.76 | **78.73**|
|
56 |
+
|
57 |
+
Our Zeitungs-LM beats the German Europeana BERT model by a large margin (2.23%).
|
58 |
+
|
59 |
+
[1]: https://aclanthology.org/P18-2020/
|
60 |
+
[2]: https://aclanthology.org/P18-2020/
|
61 |
+
[3]: https://huggingface.co/datasets/stefan-it/HisGermaNER
|
62 |
+
[4]: https://github.com/hipe-eval/HIPE-2022-data/blob/main/documentation/README-hipe2020.md
|
63 |
+
[5]: https://github.com/hipe-eval/HIPE-2022-data/blob/main/documentation/README-newseye.md
|
64 |
+
[6]: https://github.com/hipe-eval/HIPE-2022-data/blob/main/documentation/README-ajmc.md
|
65 |
+
[7]: https://huggingface.co/dbmdz/bert-base-german-europeana-cased
|
66 |
+
|
67 |
# Changelog
|
68 |
|
69 |
+
* 02.10.2024: Initial version of the model. More details are coming very soon!
|
70 |
|
71 |
# Acknowledgements
|
72 |
|