Finnish-NLP
/

roberta-large-finnish-v2

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

aapot commited on Jan 16, 2022

Commit

cef822d

•

1 Parent(s): 97bfc81

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -134,7 +134,7 @@ Contrary to BERT, the masking is done dynamically during pretraining (e.g., it c
 ### Pretraining
-The model was trained on TPUv3-8 VM, sponsored by the [Google TPU Research Cloud](https://sites.research.google/trc/about/), for 2 epochs with a sequence length of 128 and continuing for one more epoch with a sequence length of 512. The optimizer used for the 128 sequence training was AdamW, and for the 512 sequence training it was Adafactor (to save memory). Learning rate was 2e-4, \\(\beta_{1} = 0.9\\), \\(\beta_{2} = 0.98\\) and \\(\epsilon = 1e-6\\), learning rate warmup for 1500 steps and linear decay of the learning rate after.
 ## Evaluation results
@@ -147,7 +147,7 @@ When fine-tuned on those datasets, this model (the first row of the table) achie
 |Finnish-NLP/roberta-large-finnish       |88.02     |94.53                |95.23                |74.30                 |
 |TurkuNLP/bert-base-finnish-cased-v1     |**88.82** |**94.90**            |**95.49**            |**76.07**             |
-To conclude, this model didn't improve significantly compared to our previous [Finnish-NLP/roberta-large-finnish](https://huggingface.co/Finnish-NLP/roberta-large-finnish) model. This model is also slightly (~ 1%) losing to the [FinBERT (Finnish BERT)](https://huggingface.co/TurkuNLP/bert-base-finnish-cased-v1) model.
 ## Team Members

 ### Pretraining
+The model was trained on TPUv3-8 VM, sponsored by the [Google TPU Research Cloud](https://sites.research.google/trc/about/), for 520k train steps (2 epochs, batch size 512) with a sequence length of 128 and continuing for 520k steps (1 epoch, batch size 64) with a sequence length of 512. The optimizer used for the 128 sequence training was AdamW, and for the 512 sequence training it was Adafactor (to save memory). Learning rate was 2e-4, \\(\beta_{1} = 0.9\\), \\(\beta_{2} = 0.98\\) and \\(\epsilon = 1e-6\\), learning rate warmup for 1500 steps and linear decay of the learning rate after.
 ## Evaluation results
 |Finnish-NLP/roberta-large-finnish       |88.02     |94.53                |95.23                |74.30                 |
 |TurkuNLP/bert-base-finnish-cased-v1     |**88.82** |**94.90**            |**95.49**            |**76.07**             |
+To conclude, this model didn't significantly improve compared to our previous [Finnish-NLP/roberta-large-finnish](https://huggingface.co/Finnish-NLP/roberta-large-finnish) model. This model is also slightly (~ 1%) losing to the [FinBERT (Finnish BERT)](https://huggingface.co/TurkuNLP/bert-base-finnish-cased-v1) model.
 ## Team Members