--- license: cc-by-nc-3.0 language: - da tags: - word embeddings - Danish --- # Danish medical word embeddings MeDa-We was trained on a Danish medical corpus of 123M tokens. The word embeddings are 300-dimensional and are trained using [FastText](https://fasttext.cc/). The embeddings were trained for 10 epochs using a window size of 5 and 10 negative samples. The development of the corpus and word embeddings is described further in our [paper](https://aclanthology.org/2023.nodalida-1.31/). We also trained a transformer model on the developed corpus which can be found [here](https://huggingface.co/jannikskytt/MeDa-Bert). ### Citing ``` @inproceedings{pedersen-etal-2023-meda, title = "{M}e{D}a-{BERT}: A medical {D}anish pretrained transformer model", author = "Pedersen, Jannik and Laursen, Martin and Vinholt, Pernille and Savarimuthu, Thiusius Rajeeth", booktitle = "Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)", month = may, year = "2023", address = "T{\'o}rshavn, Faroe Islands", publisher = "University of Tartu Library", url = "https://aclanthology.org/2023.nodalida-1.31", pages = "301--307", } ```