---
license: mit
datasets:
- oscar
- DDSC/dagw_reddit_filtered_v1.0.0
- graelo/wikipedia
language:
- da
widget:
- text: Der var engang en [MASK]
---

# What is this?

A pre-trained BERT model (base version, ~110 M parameters) for Danish NLP. The model was not pre-trained from scratch but adapted from the English version with a tokenizer trained on Danish text.

# How to use

Test the model using the pipeline from the [🤗 Transformers](https://github.com/huggingface/transformers) library:

```python
from transformers import pipeline

pipe = pipeline("fill-mask", model="KennethTM/bert-base-uncased-danish")

pipe("Der var engang en [MASK]")
```

Or load it using the Auto* classes:

```python
# Load model directly
from transformers import AutoTokenizer, AutoModelForMaskedLM

tokenizer = AutoTokenizer.from_pretrained("KennethTM/bert-base-uncased-danish")
model = AutoModelForMaskedLM.from_pretrained("KennethTM/bert-base-uncased-danish")
```

# Model training

The model is trained using multiple Danish datasets and a context length of 512 tokens.

The model weights are initialized from the English [bert-base-uncased model](https://huggingface.co/bert-base-uncased) with new word token embeddings created for Danish using [WECHSEL](https://github.com/CPJKU/wechsel).

Initially, only the word token embeddings are trained using 1.000.000 samples. Finally, the whole model is trained for 8 epochs.


# Evaluation

The performance of the pretrained model was evaluated using [ScandEval](https://github.com/ScandEval/ScandEval).

| Task                     | Dataset      | Score (±SE)                      |
|:-------------------------|:-------------|:---------------------------------|
| sentiment-classification | swerec       | mcc = 63.02 (±2.16)              |
|                          |              | macro_f1 = 62.2 (±3.61)          |
| sentiment-classification | angry-tweets | mcc = 47.21 (±0.53)              |
|                          |              | macro_f1 = 64.21 (±0.53)         |
| sentiment-classification | norec        | mcc = 42.23 (±8.69)              |
|                          |              | macro_f1 = 57.24 (±7.67)         |
| named-entity-recognition | suc3         | micro_f1 = 50.03 (±4.16)         |
|                          |              | micro_f1_no_misc = 53.55 (±4.57) |
| named-entity-recognition | dane         | micro_f1 = 76.44 (±1.36)         |
|                          |              | micro_f1_no_misc = 80.61 (±1.11) |
| named-entity-recognition | norne-nb     | micro_f1 = 68.38 (±1.72)         |
|                          |              | micro_f1_no_misc = 73.08 (±1.66) |
| named-entity-recognition | norne-nn     | micro_f1 = 60.45 (±1.71)         |
|                          |              | micro_f1_no_misc = 64.39 (±1.8)  |
| linguistic-acceptability | scala-sv     | mcc = 5.01 (±5.41)               |
|                          |              | macro_f1 = 49.46 (±3.67)         |
| linguistic-acceptability | scala-da     | mcc = 54.74 (±12.22)             |
|                          |              | macro_f1 = 76.25 (±6.09)         |
| linguistic-acceptability | scala-nb     | mcc = 19.18 (±14.01)             |
|                          |              | macro_f1 = 55.3 (±8.85)          |
| linguistic-acceptability | scala-nn     | mcc = 5.72 (±5.91)               |
|                          |              | macro_f1 = 49.56 (±3.73)         |
| question-answering       | scandiqa-da  | em = 26.36 (±1.17)               |
|                          |              | f1 = 32.41 (±1.1)                |
| question-answering       | scandiqa-no  | em = 26.14 (±1.59)               |
|                          |              | f1 = 32.02 (±1.59)               |
| question-answering       | scandiqa-sv  | em = 26.38 (±1.1)                |
|                          |              | f1 = 32.33 (±1.05)               |
| speed                    | speed        | speed = 4.55 (±0.0)              |