Update README.md
Browse files
README.md
CHANGED
@@ -132,7 +132,7 @@ Contrary to BERT, the masking is done dynamically during pretraining (e.g., it c
|
|
132 |
|
133 |
### Pretraining
|
134 |
|
135 |
-
The model was trained on TPUv3-8 VM, sponsored by the Google TPU Research Cloud, for 2 epochs with a sequence length of 128 and continuing for one more epoch with a sequence length of 512. The optimizer used is Adafactor with a learning rate of 2e-4, \\(\beta_{1} = 0.9\\), \\(\beta_{2} = 0.98\\) and \\(\epsilon = 1e-6\\), learning rate warmup for 1500 steps and linear decay of the learning rate after.
|
136 |
|
137 |
## Evaluation results
|
138 |
|
@@ -149,6 +149,8 @@ To conclude, this model improves on our previous [Finnish RoBERTa-large](https:/
|
|
149 |
|
150 |
## Team Members
|
151 |
|
152 |
-
- Aapo Tanskanen
|
153 |
-
- Rasmus Toivanen
|
154 |
-
- Tommi Vehviläinen
|
|
|
|
|
|
132 |
|
133 |
### Pretraining
|
134 |
|
135 |
+
The model was trained on TPUv3-8 VM, sponsored by the [Google TPU Research Cloud](https://sites.research.google/trc/about/), for 2 epochs with a sequence length of 128 and continuing for one more epoch with a sequence length of 512. The optimizer used is Adafactor with a learning rate of 2e-4, \\(\beta_{1} = 0.9\\), \\(\beta_{2} = 0.98\\) and \\(\epsilon = 1e-6\\), learning rate warmup for 1500 steps and linear decay of the learning rate after.
|
136 |
|
137 |
## Evaluation results
|
138 |
|
|
|
149 |
|
150 |
## Team Members
|
151 |
|
152 |
+
- Aapo Tanskanen, [Hugging Face profile](https://huggingface.co/aapot), [LinkedIn profile](https://www.linkedin.com/in/aapotanskanen/)
|
153 |
+
- Rasmus Toivanen [Hugging Face profile](https://huggingface.co/RASMUS), [LinkedIn profile](https://www.linkedin.com/in/rasmustoivanen/)
|
154 |
+
- Tommi Vehviläinen [Hugging Face profile](https://huggingface.co/Tommi)
|
155 |
+
|
156 |
+
Feel free to contact us for more details 🤗
|