jannikskytt
/

MeDa-WE

word embeddings

Model card Files Files and versions Community

MeDa-WE / README.md

jannikskytt's picture

Update README.md

2a281e6 over 1 year ago

|

history blame contribute delete

1.22 kB

	---
	license: cc-by-nc-3.0
	language:
	- da
	tags:
	- word embeddings
	- Danish
	---
	# Danish medical word embeddings

	MeDa-We was trained on a Danish medical corpus of 123M tokens. The word embeddings are 300-dimensional and are trained using [FastText](https://fasttext.cc/).

	The embeddings were trained for 10 epochs using a window size of 5 and 10 negative samples.

	The development of the corpus and word embeddings is described further in our [paper](https://aclanthology.org/2023.nodalida-1.31/).

	We also trained a transformer model on the developed corpus which can be found [here](https://huggingface.co/jannikskytt/MeDa-Bert).

	### Citing

	```
	@inproceedings{pedersen-etal-2023-meda,
	title = "{M}e{D}a-{BERT}: A medical {D}anish pretrained transformer model",
	author = "Pedersen, Jannik and
	Laursen, Martin and
	Vinholt, Pernille and
	Savarimuthu, Thiusius Rajeeth",
	booktitle = "Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)",
	month = may,
	year = "2023",
	address = "T{\'o}rshavn, Faroe Islands",
	publisher = "University of Tartu Library",
	url = "https://aclanthology.org/2023.nodalida-1.31",
	pages = "301--307",
	}
	```