File size: 1,553 Bytes
feaed04 b7925d4 feaed04 b7925d4 feaed04 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
---
language: sv
---
# A Swedish Bert model
## Model description
This model follows the Bert Large model architecture as implemented in [Megatron-LM framework](https://github.com/NVIDIA/Megatron-LM). It was trained with a batch size of 512 in 600k steps. The model contains following parameters:
<figure>
| Hyperparameter | Value |
|----------------------|------------|
| \\(n_{parameters}\\) | 340M |
| \\(n_{layers}\\) | 24 |
| \\(n_{heads}\\) | 16 |
| \\(n_{ctx}\\) | 1024 |
| \\(n_{vocab}\\) | 30592 |
## Training data
The model is pretrained on a Swedish text corpus of around 85 GB from a variety of sources as shown below.
<figure>
| Dataset | Genre | Size(GB)|
|----------------------|------|------|
| Anföranden | Politics |0.9|
|DCEP|Politics|0.6|
|DGT|Politics|0.7|
|Fass|Medical|0.6|
|Författningar|Legal|0.1|
|Web data|Misc|45.0|
|JRC|Legal|0.4|
|Litteraturbanken|Books|0.3O|
|SCAR|Misc|28.0|
|SOU|Politics|5.3|
|Subtitles|Drama|1.3|
|Wikipedia|Facts|1.8|
## Intended uses & limitations
The raw model can be used for the usual tasks of masked language modeling or next sentence prediction. It is also often fine-tuned on a downstream task to improve its performance in a specific domain/task.
<br>
<br>
## How to use
```python
from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("AI-Nordics/bert-large-swedish-cased")
model = AutoModelForMaskedLM.from_pretrained("AI-Nordics/bert-large-swedish-cased")
|