stefan-it's picture
readme: update model card
d1a8ca0
---
license: mit
language:
- en
- de
- fr
- fi
- sv
- nl
---
# hmByT5 - Preliminary Language Models
Preliminary Historic Multilingual and Monolingual ByT5 Models. Following languages are currently covered:
* English (British Library Corpus - Books)
* German (Europeana Newspaper)
* French (Europeana Newspaper)
* Finnish (Europeana Newspaper)
* Swedish (Europeana Newspaper)
* Dutch (Delpher Corpus)
More details can be found in [our GitHub repository](https://github.com/stefan-it/hmByT5).
# Pretraining
We pretrain hmByT5 on a v3-32 TPU Pod. Details about the training can be found
[here](https://github.com/stefan-it/hmByT5/tree/main/hmbyt5).
# Evaluation on Downstream Tasks (NER)
We evaluated the hmByT5 model that was pretrained on English AjMC corpus for 200k steps:
| Hyper-param Configuration | Run 1 | Run 2 | Run 3 | Run 4 | Run 5 | Avg. |
|------------------------------------------|-------|-------|-------|-------|-------|--------------|
| `wsFalse-bs4-e10-lr0.00016-poolingfirst` | 83.80 | 84.78 | 83.74 | 83.35 | 84.37 | 84.01 ± 0.50 |
| `wsFalse-bs4-e10-lr0.00015-poolingfirst` | 84.67 | 82.69 | 83.92 | 84.53 | 82.90 | 83.74 ± 0.82 |
| `wsFalse-bs8-e10-lr0.00016-poolingfirst` | 82.12 | 83.82 | 83.37 | 83.00 | 83.70 | 83.20 ± 0.61 |
| `wsFalse-bs8-e10-lr0.00015-poolingfirst` | 83.45 | 82.83 | 84.15 | 81.76 | 83.78 | 83.19 ± 0.84 |
It turns out, that the results are not on-par with current SOTA on the English AjMC corpus, see a comparison
[here](https://github.com/stefan-it/blbooks-lms#model-zoo). Thus, we continue experiments with the Hugging Face
Transformers JAX/FLAX implementation to pretrain ByT5 models on TPU.
# Acknowledgements
Research supported with Cloud TPUs from Google's [TPU Research Cloud](https://sites.research.google/trc/about/) (TRC).
Many Thanks for providing access to the TPUs ❤️