Kaspar commited on
Commit
9d7eb61
โ€ข
1 Parent(s): a806757

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -4
README.md CHANGED
@@ -31,6 +31,21 @@ This model is served to you by [Kaspar von Beelen](https://huggingface.co/Kaspar
31
 
32
  \*ERWT is dutch for PEA.
33
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
  ## Introductory Note: Repent Now. ๐Ÿ˜‡
35
 
36
  The ERWT models are trained for **experimental purposes**, please use them with care.
@@ -56,7 +71,7 @@ These formatted chunks of text are then forwarded to the data collator and event
56
 
57
  Exposed to both the tokens and (temporal) metadata, the model learns a relation between text and time. When a token is masked, the prepended `year` field is taken into account when predicting hidden words in the text. Vice versa, when the metadata token is hidden, the model aims to predict the year of publication based on the content.
58
 
59
- ## Intended Uses: LMs as History Machines.
60
 
61
  Exposing the model to temporal metadata allows us to investigate **historical language change** and perform **date prediction**.
62
 
@@ -113,7 +128,7 @@ Firstly, eyeballing some toy examples (but also using more rigorous metrics such
113
 
114
  Secondly, MDMA may reduce biases induced by imbalances in the training data (or at least give us more of a handle on this problem). Admittedly, we have to prove this more formally, but some experiments at least hint in this direction. The data used for training is highly biased towards the Victorian age and a standard language model trained on this corpus will predict "her" for ```"[MASK] Majesty"```.
115
 
116
- ### Date Prediction
117
 
118
  Another feature of the ERWT model series is date prediction. Remember that during training the temporal metadata token is often masked. In this case, the model effectively learns to situate documents in time based on the tokens they contain.
119
 
@@ -156,7 +171,7 @@ In both cases, our answers would be "yes, but...". ERWT's time-stamping powers h
156
  Firstly, we used date prediction for evaluation purposes, to measure which training routine produces models
157
  Secondly, we could use it as an analytical tool, to study how temporal variation **within** text documents and further scrutinise which features drive the time prediction (it goes without saying that the same applies to other metadata fields, but for example predicting political orientation).
158
 
159
- ## Limitations
160
 
161
  The ERWT series were trained for evaluation purposes and therefore carry some critical limitations.
162
 
@@ -207,7 +222,7 @@ Temporally, most of the articles date from the second half of the nineteenth cen
207
 
208
  ![number of article by year](https://github.com/Living-with-machines/ERWT/raw/main/articles_by_year.png)
209
 
210
- ## Evaluation
211
 
212
  Our article ["Metadata Might Make Language Models Better"](https://drive.google.com/file/d/1Xp21KENzIeEqFpKvO85FkHynC0PNwBn7/view?usp=sharing) comprises quite an extensive evaluation of all the language models created with MDMA. For details, we recommend you read and cite the current working papers.
213
 
 
31
 
32
  \*ERWT is dutch for PEA.
33
 
34
+ # Overview
35
+
36
+ - [Introduction: Repent Now ๐Ÿ˜‡](#introductory-note-repent-now-%F0%9F%98%87)
37
+ - [Background: MDMA to the rescue ๐Ÿ™‚](#background-mdma-to-the-rescue-%F0%9F%99%82)
38
+ - [Intended Use: LMs as History Machines ๐Ÿš‚](#intended-use-lms-as-history-machines)
39
+ - [Historical Language Change: Her/His Majesty? ๐Ÿ‘‘](#historical-language-change-herhis-majesty-%F0%9F%91%91)
40
+ - [Date Prediction: Pub Quiz with LMs ๐Ÿป](#date-prediction)
41
+ - [Limitations: Not all is well ๐Ÿ˜ฎ](#limitations)
42
+ - [Training Data](#training-data)
43
+ - [Training Routine](#training-routine)
44
+ - [Data Description](#data-description)
45
+ - [Evaluation: ๐Ÿค“ In case you care to count ๐Ÿค“](#evaluation)
46
+
47
+
48
+
49
  ## Introductory Note: Repent Now. ๐Ÿ˜‡
50
 
51
  The ERWT models are trained for **experimental purposes**, please use them with care.
 
71
 
72
  Exposed to both the tokens and (temporal) metadata, the model learns a relation between text and time. When a token is masked, the prepended `year` field is taken into account when predicting hidden words in the text. Vice versa, when the metadata token is hidden, the model aims to predict the year of publication based on the content.
73
 
74
+ ## Intended Use: LMs as History Machines.
75
 
76
  Exposing the model to temporal metadata allows us to investigate **historical language change** and perform **date prediction**.
77
 
 
128
 
129
  Secondly, MDMA may reduce biases induced by imbalances in the training data (or at least give us more of a handle on this problem). Admittedly, we have to prove this more formally, but some experiments at least hint in this direction. The data used for training is highly biased towards the Victorian age and a standard language model trained on this corpus will predict "her" for ```"[MASK] Majesty"```.
130
 
131
+ ### Date Prediction: Pub Quiz with LMs
132
 
133
  Another feature of the ERWT model series is date prediction. Remember that during training the temporal metadata token is often masked. In this case, the model effectively learns to situate documents in time based on the tokens they contain.
134
 
 
171
  Firstly, we used date prediction for evaluation purposes, to measure which training routine produces models
172
  Secondly, we could use it as an analytical tool, to study how temporal variation **within** text documents and further scrutinise which features drive the time prediction (it goes without saying that the same applies to other metadata fields, but for example predicting political orientation).
173
 
174
+ ## Limitations: Not all is well ๐Ÿ˜ฎ.
175
 
176
  The ERWT series were trained for evaluation purposes and therefore carry some critical limitations.
177
 
 
222
 
223
  ![number of article by year](https://github.com/Living-with-machines/ERWT/raw/main/articles_by_year.png)
224
 
225
+ ## Evaluation: ๐Ÿค“ In case you care to count ๐Ÿค“
226
 
227
  Our article ["Metadata Might Make Language Models Better"](https://drive.google.com/file/d/1Xp21KENzIeEqFpKvO85FkHynC0PNwBn7/view?usp=sharing) comprises quite an extensive evaluation of all the language models created with MDMA. For details, we recommend you read and cite the current working papers.
228