File size: 1,461 Bytes
058370d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
67bdaac
 
058370d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
---
license: cc-by-nc-3.0
language:
- da
pipeline_tag: fill-mask
tags:
- bert
- danish
widget:
- text: Hvide blodlegemer beskytter kroppen mod [MASK]
---


# Danish medical BERT

MeDa-BERT was initialized with weights from a [pretrained Danish BERT model](https://huggingface.co/Maltehb/danish-bert-botxo) and pretrained for 48 epochs using the MLM objective on a Danish medical corpus of 123M tokens.

The development of the corpus and model is described further in [this paper](https://aclanthology.org/2023.nodalida-1.31/).

Here is an example on how to load the model in PyTorch using the [🤗Transformers](https://github.com/huggingface/transformers) library:



```python
from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("indsigt-ai/MeDa-BERT")
model = AutoModelForMaskedLM.from_pretrained("indsigt-ai/MeDa-BERT")
```

### Citing

```
@inproceedings{pedersen-etal-2023-meda,
    title = "{M}e{D}a-{BERT}: A medical {D}anish pretrained transformer model",
    author = "Pedersen, Jannik  and
      Laursen, Martin  and
      Vinholt, Pernille  and
      Savarimuthu, Thiusius Rajeeth",
    booktitle = "Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)",
    month = may,
    year = "2023",
    address = "T{\'o}rshavn, Faroe Islands",
    publisher = "University of Tartu Library",
    url = "https://aclanthology.org/2023.nodalida-1.31",
    pages = "301--307",
}
```