metadata

license: mit
language:
  - en
  - de
  - fr
  - fi
  - sv
  - nl

hmByT5 - Preliminary Language Models

Preliminary Historic Multilingual and Monolingual ByT5 Models. Following languages are currently covered:

English (British Library Corpus - Books)
German (Europeana Newspaper)
French (Europeana Newspaper)
Finnish (Europeana Newspaper)
Swedish (Europeana Newspaper)
Dutch (Delpher Corpus)

More details can be found in our GitHub repository.

Pretraining

We use the official JAX/FLAX example in Hugging Face Transformers to pretrain a ByT5 model on a single v3-8 TPU. Details about the training can be found here.

This model was trained with mean_noise_span_length=20 for one epoch.

Evaluation on Downstream Tasks (NER)

See detailed results at hmLeaderboard.

Acknowledgements

Research supported with Cloud TPUs from Google's TPU Research Cloud (TRC). Many Thanks for providing access to the TPUs ❤️