MeDa-BERT / README.md
jannikskytt's picture
Update README.md
e8e2fa4 verified
|
raw
history blame
1.46 kB
---
license: cc-by-nc-3.0
language:
- da
pipeline_tag: fill-mask
tags:
- bert
- danish
widget:
- text: Hvide blodlegemer beskytter kroppen mod [MASK]
---
# Danish medical BERT
MeDa-BERT was initialized with weights from a [pretrained Danish BERT model](https://huggingface.co/Maltehb/danish-bert-botxo) and pretrained for 48 epochs using the MLM objective on a Danish medical corpus of 123M tokens.
The development of the corpus and model is described further in [this paper](https://aclanthology.org/2023.nodalida-1.31/).
Here is an example on how to load the model in PyTorch using the [🤗Transformers](https://github.com/huggingface/transformers) library:
```python
from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("indsigt-ai/MeDa-Bert")
model = AutoModelForMaskedLM.from_pretrained("indsigt-ai/MeDa-Bert")
```
### Citing
```
@inproceedings{pedersen-etal-2023-meda,
title = "{M}e{D}a-{BERT}: A medical {D}anish pretrained transformer model",
author = "Pedersen, Jannik and
Laursen, Martin and
Vinholt, Pernille and
Savarimuthu, Thiusius Rajeeth",
booktitle = "Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)",
month = may,
year = "2023",
address = "T{\'o}rshavn, Faroe Islands",
publisher = "University of Tartu Library",
url = "https://aclanthology.org/2023.nodalida-1.31",
pages = "301--307",
}
```