satyaalmasian
commited on
Commit
•
a523f78
1
Parent(s):
e0813d6
Update README.md
Browse files
README.md
CHANGED
@@ -37,7 +37,7 @@ for an example with post-processing, refer to the [repository](https://github.co
|
|
37 |
We provide a function `merge_tokens` to decipher the output.
|
38 |
to further fine-tune, use the `Trainer` from hugginface. An example of a similar fine-tuning can be found [here](https://github.com/satya77/Transformer_Temporal_Tagger/blob/master/run_token_classifier.py).
|
39 |
|
40 |
-
#Training data
|
41 |
|
42 |
For pre-training we use a large corpus of automatically annotated news articles with heideltime.
|
43 |
|
@@ -45,7 +45,7 @@ We use 2 data sources for fine-tunning. :
|
|
45 |
[Tempeval-3](https://www.cs.york.ac.uk/semeval-2013/task1/index.php%3Fid=data.html),automatically translated to gemran,
|
46 |
[KRAUTS dataset](https://github.com/JannikStroetgen/KRAUTS).
|
47 |
|
48 |
-
#Training procedure
|
49 |
The model is trained from publicly available checkpoints on huggingface (`deepset/gelectra-large`), with a batch size of 192. We use a learning rate of 1e-07 with an Adam optimizer and linear weight decay for pretraining.
|
50 |
For fine-tuning we use a batch size of 16. We use a learning rate of 5e-05 with an Adam optimizer and linear weight decay.
|
51 |
We fine-tune with 3 different random seeds, this version of the model is the only seed=7.
|
|
|
37 |
We provide a function `merge_tokens` to decipher the output.
|
38 |
to further fine-tune, use the `Trainer` from hugginface. An example of a similar fine-tuning can be found [here](https://github.com/satya77/Transformer_Temporal_Tagger/blob/master/run_token_classifier.py).
|
39 |
|
40 |
+
# Training data
|
41 |
|
42 |
For pre-training we use a large corpus of automatically annotated news articles with heideltime.
|
43 |
|
|
|
45 |
[Tempeval-3](https://www.cs.york.ac.uk/semeval-2013/task1/index.php%3Fid=data.html),automatically translated to gemran,
|
46 |
[KRAUTS dataset](https://github.com/JannikStroetgen/KRAUTS).
|
47 |
|
48 |
+
# Training procedure
|
49 |
The model is trained from publicly available checkpoints on huggingface (`deepset/gelectra-large`), with a batch size of 192. We use a learning rate of 1e-07 with an Adam optimizer and linear weight decay for pretraining.
|
50 |
For fine-tuning we use a batch size of 16. We use a learning rate of 5e-05 with an Adam optimizer and linear weight decay.
|
51 |
We fine-tune with 3 different random seeds, this version of the model is the only seed=7.
|