satyaalmasian
/

temporal_tagger_German_GELECTRA

Token Classification

Inference Endpoints

Model card Files Files and versions Community

satyaalmasian commited on Feb 10, 2022

Commit

a523f78

·

1 Parent(s): e0813d6

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -37,7 +37,7 @@ for an example with post-processing, refer to the [repository](https://github.co
 We provide a function `merge_tokens` to decipher the output.
 to further fine-tune, use the `Trainer` from hugginface. An example of a similar fine-tuning can be found [here](https://github.com/satya77/Transformer_Temporal_Tagger/blob/master/run_token_classifier.py).
-#Training data
 For pre-training we use a large corpus of automatically annotated news articles with heideltime.
@@ -45,7 +45,7 @@ We use 2 data sources for fine-tunning. :
 [Tempeval-3](https://www.cs.york.ac.uk/semeval-2013/task1/index.php%3Fid=data.html),automatically translated to gemran,
 [KRAUTS dataset](https://github.com/JannikStroetgen/KRAUTS).
-#Training procedure
 The model is trained from publicly available checkpoints on huggingface (`deepset/gelectra-large`), with a batch size of 192. We use a learning rate of 1e-07 with an Adam optimizer and linear weight decay for pretraining.
 For fine-tuning we use a batch size of 16. We use a learning rate of 5e-05 with an Adam optimizer and linear weight decay.
 We fine-tune with 3 different random seeds, this version of the model is the only seed=7.

 We provide a function `merge_tokens` to decipher the output.
 to further fine-tune, use the `Trainer` from hugginface. An example of a similar fine-tuning can be found [here](https://github.com/satya77/Transformer_Temporal_Tagger/blob/master/run_token_classifier.py).
+# Training data
 For pre-training we use a large corpus of automatically annotated news articles with heideltime.
 [Tempeval-3](https://www.cs.york.ac.uk/semeval-2013/task1/index.php%3Fid=data.html),automatically translated to gemran,
 [KRAUTS dataset](https://github.com/JannikStroetgen/KRAUTS).
+# Training procedure
 The model is trained from publicly available checkpoints on huggingface (`deepset/gelectra-large`), with a batch size of 192. We use a learning rate of 1e-07 with an Adam optimizer and linear weight decay for pretraining.
 For fine-tuning we use a batch size of 16. We use a learning rate of 5e-05 with an Adam optimizer and linear weight decay.
 We fine-tune with 3 different random seeds, this version of the model is the only seed=7.