Update README.md
Browse files
README.md
CHANGED
@@ -169,9 +169,10 @@ Many of the limitations are a direct result of the data. ERWT models are trained
|
|
169 |
We created this model as part of a wider experiment, which attempted to establish best practices for training models with metadata. An overview of all the models is available on our [GitHub](https://github.com/Living-with-machines/ERWT/) page.
|
170 |
|
171 |
To reduce training time, we based our experiments on a random subsample of the HMD corpus, consisting of half a billion tokens.
|
172 |
-
Furthermore, we only trained the models for one epoch, which implies .
|
173 |
|
174 |
-
We were mainly interested in the relative performance of the different ERWT models and
|
|
|
175 |
|
176 |
## Data Description
|
177 |
|
|
|
169 |
We created this model as part of a wider experiment, which attempted to establish best practices for training models with metadata. An overview of all the models is available on our [GitHub](https://github.com/Living-with-machines/ERWT/) page.
|
170 |
|
171 |
To reduce training time, we based our experiments on a random subsample of the HMD corpus, consisting of half a billion tokens.
|
172 |
+
Furthermore, we only trained the models for one epoch, which implies they are most likely undertrained at the moment.
|
173 |
|
174 |
+
We were mainly interested in the **relative** performance of the different ERWT models. We did, however, compared ERWT with with [`distilbert-base-cased`](https://huggingface.co/distilbert-base-cased) in our evaluation experiments, and of course, our tiny LM peas
|
175 |
+
did much better. 🥳
|
176 |
|
177 |
## Data Description
|
178 |
|