# T5 v1.1 Base finetuned for CNN news summarization in Dutch 🇳🇱

This model is t5-v1.1-base-dutch-cased finetuned on CNN Dailymail NL

For a demo of the Dutch CNN summarization models, head over to the Hugging Face Spaces for the Netherformer 📰 example application!

Rouge scores for this model are listed below.

## Tokenizer

• SentencePiece tokenizer trained from scratch for Dutch on mC4 nl cleaned with scripts from the Huggingface Transformers Flax examples.

## Dataset

All models listed below are trained on of the full configuration (39B tokens) of cleaned Dutch mC4, which is the original mC4, except

• Documents that contained words from a selection of the Dutch and English List of Dirty Naught Obscene and Otherwise Bad Words are removed
• Sentences with less than 3 words are removed
• Sentences with a word of more than 1000 characters are removed
• Documents with less than 5 sentences are removed

## Models

TL;DR: yhavinga/t5-v1.1-base-dutch-cased is the best model.

• yhavinga/t5-base-dutch is a re-training of the Dutch T5 base v1.0 model trained during the summer 2021 Flax/Jax community week. Accuracy was improved from 0.64 to 0.70.
• The two T5 v1.1 base models are an uncased and cased version of t5-v1.1-base, again pre-trained from scratch on Dutch, with a tokenizer also trained from scratch. The t5 v1.1 models are slightly different from the t5 models, and the base models are trained with a dropout of 0.0. For fine-tuning it is intended to set this back to 0.1.
• The large cased model is a pre-trained Dutch version of t5-v1.1-large. Training of t5-v1.1-large proved difficult. Without dropout regularization, the training would diverge at a certain point. With dropout training went better, be it much slower than training the t5-model. At some point convergance was too slow to warrant further training. The latest checkpoint, training scripts and metrics are available for reference. For actual fine-tuning the cased base model is probably the better choice.
model train seq len acc loss batch size epochs steps dropout optim lr duration
yhavinga/t5-base-dutch T5 512 0,70 1,38 128 1 528481 0.1 adafactor 5e-3 2d 9h
yhavinga/t5-v1.1-base-dutch-uncased t5-v1.1 1024 0,73 1,20 64 2 1014525 0.0 adafactor 5e-3 5d 5h
yhavinga/t5-v1.1-base-dutch-cased t5-v1.1 1024 0,78 0,96 64 2 1210000 0.0 adafactor 5e-3 6d 6h
yhavinga/t5-v1.1-large-dutch-cased t5-v1.1 512 0,76 1,07 64 1 1120000 0.1 adafactor 5e-3 86 13h

The cased t5-v1.1 Dutch models were fine-tuned on summarizing the CNN Daily Mail dataset.

model input len target len Rouge1 Rouge2 RougeL RougeLsum Test Gen Len epochs batch size steps duration
yhavinga/t5-v1.1-base-dutch-cnn-test t5-v1.1 1024 96 34,8 13,6 25,2 32,1 79 6 64 26916 2h 40m
yhavinga/t5-v1.1-large-dutch-cnn-test t5-v1.1 1024 96 34,4 13,6 25,3 31,7 81 5 16 89720 11h

## Acknowledgements

This project would not have been possible without compute generously provided by Google through the TPU Research Cloud. The HuggingFace 🤗 ecosystem was also instrumental in many, if not all parts of the training. The following repositories where helpful in setting up the TPU-VM, and training the models:

Created by Yeb Havinga