--- language: nl widget: - text: "In het jaar 2030 zullen we" - text: "Toen ik gisteren volledig in de ban was van" - text: "Studenten en leraren van de Bogazici Universiteit in de Turkse stad Istanbul" - text: "In Israël was een strenge lockdown" tags: - gpt-neo-1.3B - gpt-neo pipeline_tag: text-generation datasets: - yhavinga/mc4_nl_cleaned --- # GPT Neo 1.3B pre-trained on cleaned Dutch mC4 🇳🇱 *NB: Training in progress.* Dataset: * [mC4 NL Cleaned](https://huggingface.co/datasets/yhavinga/mc4_nl_cleaned) * dataset config: tiny (3B tokens) * dataset config: large (24B tokens) Tokenizer: * Tokenizer trained on mC4 with scripts from the Huggingface Transformers [Flax examples](https://github.com/huggingface/transformers/tree/master/examples/flax/language-modeling) Training details: * Trained for 70K steps (batch size 64) to ppl 27 on mc4 nl tiny 1 epoch * Trained for 940K steps (batch size 16) to ppl 16.1 on mc4 nl full * Training continuing * Block size: 512 * Optimizer: adafactor * lr: 5e-5 * Warmup steps: 5000 Work in progress. Jan 2022 * Many thanks to the [Google TPU Research Cloud](https://sites.research.google/trc/about/) for providing access to a TPU cluster! * Thanks to @gsarti for creating the [t5-flax-gcp repository](https://github.com/gsarti/t5-flax-gcp). * Also thanks to the creators of [gpt2-medium-persian](https://huggingface.co/flax-community/gpt2-medium-persian) and [gpt2-medium-indonesian](https://huggingface.co/flax-community/gpt2-medium-persian) for sharing their training scripts!