Update README.md
Browse files
README.md
CHANGED
@@ -151,16 +151,23 @@ ERWT clearly learned a lot about history of German unification by ploughing thro
|
|
151 |
|
152 |
Again, we have to ask: Who cares? Wikipedia can tell us pretty much the same. More importantly, don't we already have timestamps for newspaper data.
|
153 |
|
154 |
-
In both cases, our answers would be "yes, but...". ERWT's time-stamping powers
|
155 |
|
156 |
Firstly, we used date prediction for evaluation purposes, to measure which training routine produces models
|
157 |
Secondly, we could use it as an analytical tool, to study how temporal variation **within** text documents and further scrutinise which features drive the time prediction (it goes without saying that the same applies to other metadata fields, but example predicting political orientation).
|
158 |
|
159 |
## Limitations
|
160 |
|
161 |
-
The ERWT series were trained for evaluation purposes, and cary critical limitations. First of all, as explained in more detail below, this model is trained on a rather small subsample of British newspapers, with a strong Metropolitan and liberal bias.
|
162 |
|
163 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
164 |
|
165 |
## Data Description
|
166 |
|
|
|
151 |
|
152 |
Again, we have to ask: Who cares? Wikipedia can tell us pretty much the same. More importantly, don't we already have timestamps for newspaper data.
|
153 |
|
154 |
+
In both cases, our answers would be "yes, but...". ERWT's time-stamping powers have little instrumental use and won't make us rich (but donations are welcome of course 🤑). Nonetheless, we believe date prediction has value for research purposes. We can use ERWT for "fictitious" prediction, i.e. as a diagnostic tool.
|
155 |
|
156 |
Firstly, we used date prediction for evaluation purposes, to measure which training routine produces models
|
157 |
Secondly, we could use it as an analytical tool, to study how temporal variation **within** text documents and further scrutinise which features drive the time prediction (it goes without saying that the same applies to other metadata fields, but example predicting political orientation).
|
158 |
|
159 |
## Limitations
|
160 |
|
|
|
161 |
|
162 |
+
The ERWT series were trained for evaluation purposes, and carry some critical limitations.
|
163 |
+
|
164 |
+
### Training Data
|
165 |
+
|
166 |
+
Many of the limitations are a direct result of the data. ERWT models are trained on a rather small subsample of nineteenth-century British newspapers, and its predictions have to be understood in this context (remember, Her Majesty?). Moreover, the corpus has a strong Metropolitan and liberal bias (see section on Data Description for more information).
|
167 |
+
|
168 |
+
|
169 |
+
|
170 |
+
We only trained for one epoch, which suggests. For the evaluation purposes we were interested in the relative performance of our models.
|
171 |
|
172 |
## Data Description
|
173 |
|