Spaces:

hmbert-64k
/

README

Running

App Files Files Community

stefan-it commited on Oct 25, 2023

Commit

283cc59

•

1 Parent(s): 8896a07

readme: add initial version of organization card \o/

Browse files

Files changed (1) hide show

README.md +49 -1

README.md CHANGED Viewed

@@ -7,4 +7,52 @@ sdk: static
 pinned: false
 ---
-Edit this `README.md` markdown file to author your organization card.

 pinned: false
 ---
+# hmBERT 64k
+Historical Multilingual Language Models for Named Entity Recognition. The following languages are covered by hmBERT:
+* English (British Library Corpus - Books)
+* German (Europeana Newspaper)
+* French (Europeana Newspaper)
+* Finnish (Europeana Newspaper)
+* Swedish (Europeana Newspaper)
+More details can be found in [our GitHub repository](https://github.com/dbmdz/clef-hipe) and in our
+[hmBERT paper](https://ceur-ws.org/Vol-3180/paper-87.pdf).
+<div class="course-tip course-tip-orange bg-gradient-to-br dark:bg-gradient-to-r before:border-orange-500 dark:before:border-orange-800 from-orange-50 dark:from-gray-900 to-white dark:to-gray-950 border border-orange-50 text-orange-700 dark:text-gray-400">
+<p>
+  The hmBERT 64k model is a 12-layer BERT model with a 64k vocab.
+</p>
+</div>
+# Leaderboard
+We test our pretrained language models on various datasets from HIPE-2020, HIPE-2022 and Europeana.
+The following table shows an overview of used datasets:
+| Language | Datasets                                                         |
+|----------|------------------------------------------------------------------|
+| English  | [AjMC] - [TopRes19th]                                            |
+| German   | [AjMC] - [NewsEye] - [HIPE-2020]                                 |
+| French   | [AjMC] - [ICDAR-Europeana] - [LeTemps] - [NewsEye] - [HIPE-2020] |
+| Finnish  | [NewsEye]                                                        |
+| Swedish  | [NewsEye]                                                        |
+| Dutch    | [ICDAR-Europeana]                                                |
+[AjMC]: https://github.com/hipe-eval/HIPE-2022-data/blob/main/documentation/README-ajmc.md
+[NewsEye]: https://github.com/hipe-eval/HIPE-2022-data/blob/main/documentation/README-newseye.md
+[TopRes19th]: https://github.com/hipe-eval/HIPE-2022-data/blob/main/documentation/README-topres19th.md
+[ICDAR-Europeana]: https://github.com/stefan-it/historic-domain-adaptation-icdar
+[LeTemps]: https://github.com/hipe-eval/HIPE-2022-data/blob/main/documentation/README-letemps.md
+[HIPE-2020]: https://github.com/hipe-eval/HIPE-2022-data/blob/main/documentation/README-hipe2020.md
+All results can be found in the [`hmLeaderboard`](https://huggingface.co/spaces/hmbench/hmLeaderboard).
+# Acknowledgements
+We thank [Luisa März](https://github.com/LuisaMaerz), [Katharina Schmid](https://github.com/schmika) and
+[Erion Çano](https://github.com/erionc) for their fruitful discussions about Historical Language Models.
+Research supported with Cloud TPUs from Google's [TPU Research Cloud](https://sites.research.google/trc/about/) (TRC).
+Many Thanks for providing access to the TPUs ❤️