--- language: da widget: - text: En trend, der kan blive ligeså hot som. tags: - roberta - danish - masked-lm - pytorch license: agpl-3.0 --- # DanskBERT This is DanskBERT, a Danish language model. Note that you should not prepend the mask with a space when using it directly! The model is the best performing base-size model on the [ScandEval benchmark for Danish](https://scandeval.github.io/nlu-benchmark/). DanskBERT was trained on the Danish Gigaword Corpus (Strømberg-Derczynski et al., 2021). DanskBERT was trained using fairseq using the RoBERTa-base configuration. The model was trained with a batch size of 2k, and was trained to convergence for 500k steps using 16 V100 cards for approximately two weeks. If you find this model useful, please cite ``` @inproceedings{snaebjarnarson-etal-2023-transfer, title = "{T}ransfer to a Low-Resource Language via Close Relatives: The Case Study on Faroese", author = "Snæbjarnarson, Vésteinn and Simonsen, Annika and Glavaš, Goran and Vulić, Ivan", booktitle = "Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)", month = "may 22--24", year = "2023", address = "Tórshavn, Faroe Islands", publisher = {Link{\"o}ping University Electronic Press, Sweden}, } ```