aapot commited on
Commit
cef822d
1 Parent(s): 97bfc81

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -134,7 +134,7 @@ Contrary to BERT, the masking is done dynamically during pretraining (e.g., it c
134
 
135
  ### Pretraining
136
 
137
- The model was trained on TPUv3-8 VM, sponsored by the [Google TPU Research Cloud](https://sites.research.google/trc/about/), for 2 epochs with a sequence length of 128 and continuing for one more epoch with a sequence length of 512. The optimizer used for the 128 sequence training was AdamW, and for the 512 sequence training it was Adafactor (to save memory). Learning rate was 2e-4, \\(\beta_{1} = 0.9\\), \\(\beta_{2} = 0.98\\) and \\(\epsilon = 1e-6\\), learning rate warmup for 1500 steps and linear decay of the learning rate after.
138
 
139
  ## Evaluation results
140
 
@@ -147,7 +147,7 @@ When fine-tuned on those datasets, this model (the first row of the table) achie
147
  |Finnish-NLP/roberta-large-finnish |88.02 |94.53 |95.23 |74.30 |
148
  |TurkuNLP/bert-base-finnish-cased-v1 |**88.82** |**94.90** |**95.49** |**76.07** |
149
 
150
- To conclude, this model didn't improve significantly compared to our previous [Finnish-NLP/roberta-large-finnish](https://huggingface.co/Finnish-NLP/roberta-large-finnish) model. This model is also slightly (~ 1%) losing to the [FinBERT (Finnish BERT)](https://huggingface.co/TurkuNLP/bert-base-finnish-cased-v1) model.
151
 
152
  ## Team Members
153
 
 
134
 
135
  ### Pretraining
136
 
137
+ The model was trained on TPUv3-8 VM, sponsored by the [Google TPU Research Cloud](https://sites.research.google/trc/about/), for 520k train steps (2 epochs, batch size 512) with a sequence length of 128 and continuing for 520k steps (1 epoch, batch size 64) with a sequence length of 512. The optimizer used for the 128 sequence training was AdamW, and for the 512 sequence training it was Adafactor (to save memory). Learning rate was 2e-4, \\(\beta_{1} = 0.9\\), \\(\beta_{2} = 0.98\\) and \\(\epsilon = 1e-6\\), learning rate warmup for 1500 steps and linear decay of the learning rate after.
138
 
139
  ## Evaluation results
140
 
 
147
  |Finnish-NLP/roberta-large-finnish |88.02 |94.53 |95.23 |74.30 |
148
  |TurkuNLP/bert-base-finnish-cased-v1 |**88.82** |**94.90** |**95.49** |**76.07** |
149
 
150
+ To conclude, this model didn't significantly improve compared to our previous [Finnish-NLP/roberta-large-finnish](https://huggingface.co/Finnish-NLP/roberta-large-finnish) model. This model is also slightly (~ 1%) losing to the [FinBERT (Finnish BERT)](https://huggingface.co/TurkuNLP/bert-base-finnish-cased-v1) model.
151
 
152
  ## Team Members
153