Yeb Havinga commited on
Commit
ffd42a4
1 Parent(s): 5f02f6d

Autoupdate README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -4
README.md CHANGED
@@ -66,7 +66,7 @@ Three types of models have been trained. `t5-base-dutch` is the only model with
66
  The other model types t5-v1.1 and t5-eff have `gated-relu` instead of `relu` as activation function,
67
  and trained with a drop-out of `0.0` unless training would diverge (`t5-v1.1-large-dutch-cased`).
68
  The T5-eff models are models with mostly different numbers of layers. The table will list
69
- the several dimensions of these models. Note that the `efficient` is a misnomer for models with few layers,
70
  e.g. `t5-xl-4L-dutch-english-cased`, that is not efficient and one of the worst models on downstream summarization.
71
 
72
  | | t5-base-dutch | t5-v1.1-base-dutch-uncased | t5-v1.1-base-dutch-cased | t5-v1.1-large-dutch-cased | t5-v1_1-base-dutch-english-cased | t5-v1_1-base-dutch-english-cased-1024 | t5-small-24L-dutch-english | t5-xl-4L-dutch-english-cased | t5-base-36L-dutch-english-cased | t5-eff-xl-8l-dutch-english-cased | t5-eff-large-8l-dutch-english-cased |
@@ -112,10 +112,11 @@ Article and summary token lengths were set to 1024 and 142.
112
  ## Translation models
113
 
114
  The small 24L and base 36L models have been fine-tuned for translation on the CCMatrix dataset.
115
- The models with `multi` support two directions of translation. The models are trained on CCMatrix only. As this is
116
  a really large dataset with over 100M Dutch-English sentence pairs, the models are trained on a fraction of it,
117
  refer to the table below for how long. Evaluation is performed on a CCMatrix section not trained on, but also
118
- on Tatoeba and Opus Books.
 
119
 
120
  The translation metrics are listed in the table below:
121
 
@@ -143,7 +144,8 @@ The translation metrics are listed in the table below:
143
  This project would not have been possible without compute generously provided by Google through the
144
  [TPU Research Cloud](https://sites.research.google/trc/). The HuggingFace 🤗 ecosystem and was also
145
  instrumental all parts of the training. Logging metrics to Weights & Biases made it possible to keep track of many
146
- models and also some hyper-paramater sweeps, I could not imagine how I would have completed this work otherwise.
 
147
  The following repositories where helpful in setting up the TPU-VM,
148
  and getting an idea what sensible hyper-parameters are for training gpt2 from scratch.
149
 
 
66
  The other model types t5-v1.1 and t5-eff have `gated-relu` instead of `relu` as activation function,
67
  and trained with a drop-out of `0.0` unless training would diverge (`t5-v1.1-large-dutch-cased`).
68
  The T5-eff models are models with mostly different numbers of layers. The table will list
69
+ the several dimensions of these models. Note that `efficient` is a misnomer for models with few layers,
70
  e.g. `t5-xl-4L-dutch-english-cased`, that is not efficient and one of the worst models on downstream summarization.
71
 
72
  | | t5-base-dutch | t5-v1.1-base-dutch-uncased | t5-v1.1-base-dutch-cased | t5-v1.1-large-dutch-cased | t5-v1_1-base-dutch-english-cased | t5-v1_1-base-dutch-english-cased-1024 | t5-small-24L-dutch-english | t5-xl-4L-dutch-english-cased | t5-base-36L-dutch-english-cased | t5-eff-xl-8l-dutch-english-cased | t5-eff-large-8l-dutch-english-cased |
 
112
  ## Translation models
113
 
114
  The small 24L and base 36L models have been fine-tuned for translation on the CCMatrix dataset.
115
+ The models named *-`multi` support both directions of translation. The models are trained on CCMatrix only. As this is
116
  a really large dataset with over 100M Dutch-English sentence pairs, the models are trained on a fraction of it,
117
  refer to the table below for how long. Evaluation is performed on a CCMatrix section not trained on, but also
118
+ on Tatoeba and Opus Books. The `_bp` columns list the *brevity penalty*. The `avg_bleu` score is the bleu score
119
+ averaged over all three evaluation datasets.
120
 
121
  The translation metrics are listed in the table below:
122
 
 
144
  This project would not have been possible without compute generously provided by Google through the
145
  [TPU Research Cloud](https://sites.research.google/trc/). The HuggingFace 🤗 ecosystem and was also
146
  instrumental all parts of the training. Logging metrics to Weights & Biases made it possible to keep track of many
147
+ models and orchestrate hyper-parameter sweeps with insightful visualizations. I cannot imagine how I would
148
+ have completed this project otherwise.
149
  The following repositories where helpful in setting up the TPU-VM,
150
  and getting an idea what sensible hyper-parameters are for training gpt2 from scratch.
151