satyaalmasian commited on
Commit
a523f78
1 Parent(s): e0813d6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -37,7 +37,7 @@ for an example with post-processing, refer to the [repository](https://github.co
37
  We provide a function `merge_tokens` to decipher the output.
38
  to further fine-tune, use the `Trainer` from hugginface. An example of a similar fine-tuning can be found [here](https://github.com/satya77/Transformer_Temporal_Tagger/blob/master/run_token_classifier.py).
39
 
40
- #Training data
41
 
42
  For pre-training we use a large corpus of automatically annotated news articles with heideltime.
43
 
@@ -45,7 +45,7 @@ We use 2 data sources for fine-tunning. :
45
  [Tempeval-3](https://www.cs.york.ac.uk/semeval-2013/task1/index.php%3Fid=data.html),automatically translated to gemran,
46
  [KRAUTS dataset](https://github.com/JannikStroetgen/KRAUTS).
47
 
48
- #Training procedure
49
  The model is trained from publicly available checkpoints on huggingface (`deepset/gelectra-large`), with a batch size of 192. We use a learning rate of 1e-07 with an Adam optimizer and linear weight decay for pretraining.
50
  For fine-tuning we use a batch size of 16. We use a learning rate of 5e-05 with an Adam optimizer and linear weight decay.
51
  We fine-tune with 3 different random seeds, this version of the model is the only seed=7.
37
  We provide a function `merge_tokens` to decipher the output.
38
  to further fine-tune, use the `Trainer` from hugginface. An example of a similar fine-tuning can be found [here](https://github.com/satya77/Transformer_Temporal_Tagger/blob/master/run_token_classifier.py).
39
 
40
+ # Training data
41
 
42
  For pre-training we use a large corpus of automatically annotated news articles with heideltime.
43
 
45
  [Tempeval-3](https://www.cs.york.ac.uk/semeval-2013/task1/index.php%3Fid=data.html),automatically translated to gemran,
46
  [KRAUTS dataset](https://github.com/JannikStroetgen/KRAUTS).
47
 
48
+ # Training procedure
49
  The model is trained from publicly available checkpoints on huggingface (`deepset/gelectra-large`), with a batch size of 192. We use a learning rate of 1e-07 with an Adam optimizer and linear weight decay for pretraining.
50
  For fine-tuning we use a batch size of 16. We use a learning rate of 5e-05 with an Adam optimizer and linear weight decay.
51
  We fine-tune with 3 different random seeds, this version of the model is the only seed=7.