Kaspar commited on
Commit
a2502c1
1 Parent(s): e40f0e6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -3
README.md CHANGED
@@ -151,16 +151,23 @@ ERWT clearly learned a lot about history of German unification by ploughing thro
151
 
152
  Again, we have to ask: Who cares? Wikipedia can tell us pretty much the same. More importantly, don't we already have timestamps for newspaper data.
153
 
154
- In both cases, our answers would be "yes, but...". ERWT's time-stamping powers has little instrumental use and won't make us rich (but donations are welcome of course 🤑) we nonetheless believe date prediction has value for research purposes. We can use ERWT for "fictitious" prediction, i.e. as a diagnostic tool.
155
 
156
  Firstly, we used date prediction for evaluation purposes, to measure which training routine produces models
157
  Secondly, we could use it as an analytical tool, to study how temporal variation **within** text documents and further scrutinise which features drive the time prediction (it goes without saying that the same applies to other metadata fields, but example predicting political orientation).
158
 
159
  ## Limitations
160
 
161
- The ERWT series were trained for evaluation purposes, and cary critical limitations. First of all, as explained in more detail below, this model is trained on a rather small subsample of British newspapers, with a strong Metropolitan and liberal bias.
162
 
163
- Secondly, we only trained for one epoch, which suggests. For the evaluation purposes we were interested in the relative performance of our models.
 
 
 
 
 
 
 
 
164
 
165
  ## Data Description
166
 
 
151
 
152
  Again, we have to ask: Who cares? Wikipedia can tell us pretty much the same. More importantly, don't we already have timestamps for newspaper data.
153
 
154
+ In both cases, our answers would be "yes, but...". ERWT's time-stamping powers have little instrumental use and won't make us rich (but donations are welcome of course 🤑). Nonetheless, we believe date prediction has value for research purposes. We can use ERWT for "fictitious" prediction, i.e. as a diagnostic tool.
155
 
156
  Firstly, we used date prediction for evaluation purposes, to measure which training routine produces models
157
  Secondly, we could use it as an analytical tool, to study how temporal variation **within** text documents and further scrutinise which features drive the time prediction (it goes without saying that the same applies to other metadata fields, but example predicting political orientation).
158
 
159
  ## Limitations
160
 
 
161
 
162
+ The ERWT series were trained for evaluation purposes, and carry some critical limitations.
163
+
164
+ ### Training Data
165
+
166
+ Many of the limitations are a direct result of the data. ERWT models are trained on a rather small subsample of nineteenth-century British newspapers, and its predictions have to be understood in this context (remember, Her Majesty?). Moreover, the corpus has a strong Metropolitan and liberal bias (see section on Data Description for more information).
167
+
168
+
169
+
170
+ We only trained for one epoch, which suggests. For the evaluation purposes we were interested in the relative performance of our models.
171
 
172
  ## Data Description
173