KennethTM
/

bert-base-uncased-danish

Inference Endpoints

Model card Files Files and versions Community

bert-base-uncased-danish / README.md

KennethTM's picture

Update README.md

6b60f9c 11 months ago

|

history blame contribute delete

4.02 kB

	---
	license: mit
	datasets:
	- oscar
	- DDSC/dagw_reddit_filtered_v1.0.0
	- graelo/wikipedia
	language:
	- da
	widget:
	- text: Der var engang en [MASK]
	---

	# What is this?

	A pre-trained BERT model (base version, ~110 M parameters) for Danish NLP. The model was not pre-trained from scratch but adapted from the English version with a tokenizer trained on Danish text.

	# How to use

	Test the model using the pipeline from the [🤗 Transformers](https://github.com/huggingface/transformers) library:

	```python
	from transformers import pipeline

	pipe = pipeline("fill-mask", model="KennethTM/bert-base-uncased-danish")

	pipe("Der var engang en [MASK]")
	```

	Or load it using the Auto* classes:

	```python
	# Load model directly
	from transformers import AutoTokenizer, AutoModelForMaskedLM

	tokenizer = AutoTokenizer.from_pretrained("KennethTM/bert-base-uncased-danish")
	model = AutoModelForMaskedLM.from_pretrained("KennethTM/bert-base-uncased-danish")
	```

	# Model training

	The model is trained using multiple Danish datasets and a context length of 512 tokens.

	The model weights are initialized from the English [bert-base-uncased model](https://huggingface.co/bert-base-uncased) with new word token embeddings created for Danish using [WECHSEL](https://github.com/CPJKU/wechsel).

	Initially, only the word token embeddings are trained using 1.000.000 samples. Finally, the whole model is trained for 8 epochs.


	# Evaluation

	The performance of the pretrained model was evaluated using [ScandEval](https://github.com/ScandEval/ScandEval).

	\| Task \| Dataset \| Score (±SE) \|
	\|:-------------------------\|:-------------\|:---------------------------------\|
	\| sentiment-classification \| swerec \| mcc = 63.02 (±2.16) \|
	\| \| \| macro_f1 = 62.2 (±3.61) \|
	\| sentiment-classification \| angry-tweets \| mcc = 47.21 (±0.53) \|
	\| \| \| macro_f1 = 64.21 (±0.53) \|
	\| sentiment-classification \| norec \| mcc = 42.23 (±8.69) \|
	\| \| \| macro_f1 = 57.24 (±7.67) \|
	\| named-entity-recognition \| suc3 \| micro_f1 = 50.03 (±4.16) \|
	\| \| \| micro_f1_no_misc = 53.55 (±4.57) \|
	\| named-entity-recognition \| dane \| micro_f1 = 76.44 (±1.36) \|
	\| \| \| micro_f1_no_misc = 80.61 (±1.11) \|
	\| named-entity-recognition \| norne-nb \| micro_f1 = 68.38 (±1.72) \|
	\| \| \| micro_f1_no_misc = 73.08 (±1.66) \|
	\| named-entity-recognition \| norne-nn \| micro_f1 = 60.45 (±1.71) \|
	\| \| \| micro_f1_no_misc = 64.39 (±1.8) \|
	\| linguistic-acceptability \| scala-sv \| mcc = 5.01 (±5.41) \|
	\| \| \| macro_f1 = 49.46 (±3.67) \|
	\| linguistic-acceptability \| scala-da \| mcc = 54.74 (±12.22) \|
	\| \| \| macro_f1 = 76.25 (±6.09) \|
	\| linguistic-acceptability \| scala-nb \| mcc = 19.18 (±14.01) \|
	\| \| \| macro_f1 = 55.3 (±8.85) \|
	\| linguistic-acceptability \| scala-nn \| mcc = 5.72 (±5.91) \|
	\| \| \| macro_f1 = 49.56 (±3.73) \|
	\| question-answering \| scandiqa-da \| em = 26.36 (±1.17) \|
	\| \| \| f1 = 32.41 (±1.1) \|
	\| question-answering \| scandiqa-no \| em = 26.14 (±1.59) \|
	\| \| \| f1 = 32.02 (±1.59) \|
	\| question-answering \| scandiqa-sv \| em = 26.38 (±1.1) \|
	\| \| \| f1 = 32.33 (±1.05) \|
	\| speed \| speed \| speed = 4.55 (±0.0) \|