yhavinga
/

t5-v1.1-base-dutch-uncased

@@ -66,7 +66,7 @@ Three types of models have been trained. `t5-base-dutch` is the only model with
 The other model types t5-v1.1 and t5-eff have `gated-relu` instead of `relu` as activation function,
 and trained with a drop-out of `0.0` unless training would diverge (`t5-v1.1-large-dutch-cased`).
 The T5-eff models are models with mostly different numbers of layers. The table will list
-the several dimensions of these models. Note that the `efficient` is a misnomer for models with few layers,
 e.g. `t5-xl-4L-dutch-english-cased`, that is not efficient and one of the worst models on downstream summarization.
 |                   | t5-base-dutch   | t5-v1.1-base-dutch-uncased   | t5-v1.1-base-dutch-cased   | t5-v1.1-large-dutch-cased   | t5-v1_1-base-dutch-english-cased   | t5-v1_1-base-dutch-english-cased-1024   | t5-small-24L-dutch-english   | t5-xl-4L-dutch-english-cased   | t5-base-36L-dutch-english-cased   | t5-eff-xl-8l-dutch-english-cased   | t5-eff-large-8l-dutch-english-cased   |
@@ -112,10 +112,11 @@ Article and summary token lengths were set to 1024 and 142.
 ## Translation models
 The small 24L and base 36L models have been fine-tuned for translation on the CCMatrix dataset.
-The models with `multi` support two directions of translation. The models are trained on CCMatrix only. As this is
 a really large dataset with over 100M Dutch-English sentence pairs, the models are trained on a fraction of it,
 refer to the table below for how long. Evaluation is performed on a CCMatrix section not trained on, but also
-on Tatoeba and Opus Books.
 The translation metrics are listed in the table below:
@@ -143,7 +144,8 @@ The translation metrics are listed in the table below:
 This project would not have been possible without compute generously provided by Google through the
 [TPU Research Cloud](https://sites.research.google/trc/). The HuggingFace 🤗 ecosystem and was also
 instrumental all parts of the training. Logging metrics to Weights & Biases made it possible to keep track of many
-models and also some hyper-paramater sweeps, I could not imagine how I would have completed this work otherwise.
 The following repositories where helpful in setting up the TPU-VM,
 and getting an idea what sensible hyper-parameters are for training gpt2 from scratch.

 The other model types t5-v1.1 and t5-eff have `gated-relu` instead of `relu` as activation function,
 and trained with a drop-out of `0.0` unless training would diverge (`t5-v1.1-large-dutch-cased`).
 The T5-eff models are models with mostly different numbers of layers. The table will list
+the several dimensions of these models. Note that `efficient` is a misnomer for models with few layers,
 e.g. `t5-xl-4L-dutch-english-cased`, that is not efficient and one of the worst models on downstream summarization.
 |                   | t5-base-dutch   | t5-v1.1-base-dutch-uncased   | t5-v1.1-base-dutch-cased   | t5-v1.1-large-dutch-cased   | t5-v1_1-base-dutch-english-cased   | t5-v1_1-base-dutch-english-cased-1024   | t5-small-24L-dutch-english   | t5-xl-4L-dutch-english-cased   | t5-base-36L-dutch-english-cased   | t5-eff-xl-8l-dutch-english-cased   | t5-eff-large-8l-dutch-english-cased   |
 ## Translation models
 The small 24L and base 36L models have been fine-tuned for translation on the CCMatrix dataset.
+The models named *-`multi` support both directions of translation. The models are trained on CCMatrix only. As this is
 a really large dataset with over 100M Dutch-English sentence pairs, the models are trained on a fraction of it,
 refer to the table below for how long. Evaluation is performed on a CCMatrix section not trained on, but also
+on Tatoeba and Opus Books. The `_bp` columns list the *brevity penalty*. The `avg_bleu` score is the bleu score
+averaged over all three evaluation datasets.
 The translation metrics are listed in the table below:
 This project would not have been possible without compute generously provided by Google through the
 [TPU Research Cloud](https://sites.research.google/trc/). The HuggingFace 🤗 ecosystem and was also
 instrumental all parts of the training. Logging metrics to Weights & Biases made it possible to keep track of many
+models and orchestrate hyper-parameter sweeps with insightful visualizations. I cannot imagine how I would
+have completed this project otherwise.
 The following repositories where helpful in setting up the TPU-VM,
 and getting an idea what sensible hyper-parameters are for training gpt2 from scratch.