Yeb Havinga
commited on
Commit
•
ffd42a4
1
Parent(s):
5f02f6d
Autoupdate README.md
Browse files
README.md
CHANGED
@@ -66,7 +66,7 @@ Three types of models have been trained. `t5-base-dutch` is the only model with
|
|
66 |
The other model types t5-v1.1 and t5-eff have `gated-relu` instead of `relu` as activation function,
|
67 |
and trained with a drop-out of `0.0` unless training would diverge (`t5-v1.1-large-dutch-cased`).
|
68 |
The T5-eff models are models with mostly different numbers of layers. The table will list
|
69 |
-
the several dimensions of these models. Note that
|
70 |
e.g. `t5-xl-4L-dutch-english-cased`, that is not efficient and one of the worst models on downstream summarization.
|
71 |
|
72 |
| | t5-base-dutch | t5-v1.1-base-dutch-uncased | t5-v1.1-base-dutch-cased | t5-v1.1-large-dutch-cased | t5-v1_1-base-dutch-english-cased | t5-v1_1-base-dutch-english-cased-1024 | t5-small-24L-dutch-english | t5-xl-4L-dutch-english-cased | t5-base-36L-dutch-english-cased | t5-eff-xl-8l-dutch-english-cased | t5-eff-large-8l-dutch-english-cased |
|
@@ -112,10 +112,11 @@ Article and summary token lengths were set to 1024 and 142.
|
|
112 |
## Translation models
|
113 |
|
114 |
The small 24L and base 36L models have been fine-tuned for translation on the CCMatrix dataset.
|
115 |
-
The models
|
116 |
a really large dataset with over 100M Dutch-English sentence pairs, the models are trained on a fraction of it,
|
117 |
refer to the table below for how long. Evaluation is performed on a CCMatrix section not trained on, but also
|
118 |
-
on Tatoeba and Opus Books.
|
|
|
119 |
|
120 |
The translation metrics are listed in the table below:
|
121 |
|
@@ -143,7 +144,8 @@ The translation metrics are listed in the table below:
|
|
143 |
This project would not have been possible without compute generously provided by Google through the
|
144 |
[TPU Research Cloud](https://sites.research.google/trc/). The HuggingFace 🤗 ecosystem and was also
|
145 |
instrumental all parts of the training. Logging metrics to Weights & Biases made it possible to keep track of many
|
146 |
-
models and
|
|
|
147 |
The following repositories where helpful in setting up the TPU-VM,
|
148 |
and getting an idea what sensible hyper-parameters are for training gpt2 from scratch.
|
149 |
|
|
|
66 |
The other model types t5-v1.1 and t5-eff have `gated-relu` instead of `relu` as activation function,
|
67 |
and trained with a drop-out of `0.0` unless training would diverge (`t5-v1.1-large-dutch-cased`).
|
68 |
The T5-eff models are models with mostly different numbers of layers. The table will list
|
69 |
+
the several dimensions of these models. Note that `efficient` is a misnomer for models with few layers,
|
70 |
e.g. `t5-xl-4L-dutch-english-cased`, that is not efficient and one of the worst models on downstream summarization.
|
71 |
|
72 |
| | t5-base-dutch | t5-v1.1-base-dutch-uncased | t5-v1.1-base-dutch-cased | t5-v1.1-large-dutch-cased | t5-v1_1-base-dutch-english-cased | t5-v1_1-base-dutch-english-cased-1024 | t5-small-24L-dutch-english | t5-xl-4L-dutch-english-cased | t5-base-36L-dutch-english-cased | t5-eff-xl-8l-dutch-english-cased | t5-eff-large-8l-dutch-english-cased |
|
|
|
112 |
## Translation models
|
113 |
|
114 |
The small 24L and base 36L models have been fine-tuned for translation on the CCMatrix dataset.
|
115 |
+
The models named *-`multi` support both directions of translation. The models are trained on CCMatrix only. As this is
|
116 |
a really large dataset with over 100M Dutch-English sentence pairs, the models are trained on a fraction of it,
|
117 |
refer to the table below for how long. Evaluation is performed on a CCMatrix section not trained on, but also
|
118 |
+
on Tatoeba and Opus Books. The `_bp` columns list the *brevity penalty*. The `avg_bleu` score is the bleu score
|
119 |
+
averaged over all three evaluation datasets.
|
120 |
|
121 |
The translation metrics are listed in the table below:
|
122 |
|
|
|
144 |
This project would not have been possible without compute generously provided by Google through the
|
145 |
[TPU Research Cloud](https://sites.research.google/trc/). The HuggingFace 🤗 ecosystem and was also
|
146 |
instrumental all parts of the training. Logging metrics to Weights & Biases made it possible to keep track of many
|
147 |
+
models and orchestrate hyper-parameter sweeps with insightful visualizations. I cannot imagine how I would
|
148 |
+
have completed this project otherwise.
|
149 |
The following repositories where helpful in setting up the TPU-VM,
|
150 |
and getting an idea what sensible hyper-parameters are for training gpt2 from scratch.
|
151 |
|