Update README.md
Browse files
README.md
CHANGED
@@ -5,7 +5,7 @@ language: sv
|
|
5 |
# A Swedish Bert model
|
6 |
|
7 |
## Model description
|
8 |
-
This model has the same architecture as the Bert Large model in [this paper](https://arxiv.org/abs/1810.04805). It is implemented with the Megatron Bert Architecture containing following parameters:
|
9 |
<figure>
|
10 |
|
11 |
| Hyperparameter | Value |
|
@@ -18,7 +18,7 @@ This model has the same architecture as the Bert Large model in [this paper](htt
|
|
18 |
|
19 |
|
20 |
## Training data
|
21 |
-
|
22 |
<figure>
|
23 |
|
24 |
| Dataset | Genre | Size(GB)|
|
|
|
5 |
# A Swedish Bert model
|
6 |
|
7 |
## Model description
|
8 |
+
This model has the same architecture as the Bert Large model in [this paper](https://arxiv.org/abs/1810.04805). It was trained with a batch size of 512 in 600k steps. It is implemented with the Megatron Bert Architecture containing following parameters:
|
9 |
<figure>
|
10 |
|
11 |
| Hyperparameter | Value |
|
|
|
18 |
|
19 |
|
20 |
## Training data
|
21 |
+
The model is pretrained on a Swedish text corpus of around 80 GB from a variety of sources as shown below.
|
22 |
<figure>
|
23 |
|
24 |
| Dataset | Genre | Size(GB)|
|