Edit model card

hmByT5 - Preliminary Language Models

Preliminary Historic Multilingual and Monolingual ByT5 Models. Following languages are currently covered:

  • Dutch (Delpher Corpus)

More details can be found in our GitHub repository.

Pretraining

We use the official JAX/FLAX example in Hugging Face Transformers to pretrain a ByT5 model on a single v3-8 TPU. Details about the training can be found here.

This model was trained with mean_noise_span_length=20.

Evaluation on Downstream Tasks (NER)

We evaluated the hmByT5 model on ICDAR Europeana dataset:

Configuration Run 1 Run 2 Run 3 Run 4 Run 5 Avg.
wsFalse-bs4-e10-lr0.00015-poolingfirst 86.61 85.88 87.65 87.93 88.01 87.22 ± 0.83
wsFalse-bs8-e10-lr0.00015-poolingfirst 87.88 87.56 85.62 86.52 87.03 86.92 ± 0.8
wsFalse-bs4-e10-lr0.00016-poolingfirst 86.17 85.87 87.77 86.58 87.96 86.87 ± 0.85
wsFalse-bs8-e10-lr0.00016-poolingfirst 87.67 86.02 85.66 87 85.99 86.47 ± 0.75

The results show no performance improvement of the model trained with mean_noise_span_length=3, that achieved 87.90 ± 0.71.

Acknowledgements

Research supported with Cloud TPUs from Google's TPU Research Cloud (TRC). Many Thanks for providing access to the TPUs ❤️

Downloads last month
11
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.