metadata
license: cc-by-nc-3.0
language:
- da
pipeline_tag: fill-mask
tags:
- bert
- danish
widget:
- text: Hvide blodlegemer beskytter kroppen mod [MASK]
Danish medical BERT
MeDa-BERT was initialized with weights from a pretrained Danish BERT model and pretrained for 48 epochs using the MLM objective on a Danish medical corpus of 123M tokens.
The development of the corpus and model is described further in this paper.
Here is an example on how to load the model in PyTorch using the 🤗Transformers library:
from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("jannikskytt/MeDa-Bert")
model = AutoModelForMaskedLM.from_pretrained("jannikskytt/MeDa-Bert")
Citing
@inproceedings{pedersen-etal-2023-meda,
title = "{M}e{D}a-{BERT}: A medical {D}anish pretrained transformer model",
author = "Pedersen, Jannik and
Laursen, Martin and
Vinholt, Pernille and
Savarimuthu, Thiusius Rajeeth",
booktitle = "Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)",
month = may,
year = "2023",
address = "T{\'o}rshavn, Faroe Islands",
publisher = "University of Tartu Library",
url = "https://aclanthology.org/2023.nodalida-1.31",
pages = "301--307",
}