yhavinga's picture
Create README.md
80ae6b7
metadata
language:
  - nl
datasets:
  - yhavinga/mc4_nl_cleaned
tags:
  - seq2seq
  - lm-head
license: apache-2.0
inference: false

Work in progress. Dec 2021.

A collection of Dutch T5 models

t5-base-dutch t5-v1.1-base-dutch t5-v1.1-large-dutch-cased t5-v1.1-base-dutch-uncased
tokenizer cased uncased cased uncased
source model config google/t5-base google/t5-v1_1-base google/t5-v1_1-large google/t5-v1_1_base
dataset yhavinga/mc4_nl_cleaned yhavinga/mc4_nl_cleaned yhavinga/mc4_nl_cleaned yhavinga/mc4_nl_cleaned
tpu vm two one three one
finished YES
Hyperparameters
epochs 1 1 4 2
per-device batch size 16 16 2 8
tot. batch size 128 128 16 ?
steps 508 976 508 976 8 428 012 ?
max seq. length 512 512 1024 1024
tot. tok. trained on 33B 33B 138B ?
optimizer adafactor adafactor adafactor adafactor
warmup steps 10000 10000 10000 10000
learning rate 0.005 0.005 0.005 0.005
weigth decay 0.01 0.01 0.01 0.001
tie embeds false false false false
validation split size 15K examples 15K examples 15K examples 15K examples
Model config
d_ff 3072 2048 2816 2048
d_kv 64 64 64 64
d_model 768 768 1024 768
dropout rate 0.1 0.1 0.1 (0.0 wh. pre-train.) 0.1 (0.0 wh. pre-train.)
ff projection relu gated-gelu gated-gelu gated-relu
num decoder layers 12 12 24 12
num heads 12 12 16 12
num layers 12 12 24 12
rel. attn. buckets 32 32 32 32
vocab size 32103 32103 32103 32103
Training time ~ 100 hours 101 hours ~ 370 hours ?
Evaluation
accuracy 0.6976
loss 1.379