Maltehb's picture
Update README.md
ea3df07
metadata
language: da
tags:
  - danish
  - bert
  - masked-lm
  - botxo
license: cc-by-4.0
datasets:
  - common_crawl
  - wikipedia
  - dindebat.dk
  - hestenettet.dk
  - danish_OpenSubtitles
widget:
  - text: Chili Jensen, som bor  Danmarksgade 12, køber chilifrugter fra Netto.

Danish BERT (version 2, uncased) by Certainly (previously known as BotXO) finetuned for Named Entity Recognition on the DaNE dataset (Hvingelby et al., 2020) by Malte Højmark-Bertelsen.

Humongous amounts of credit needs to go to Certainly (previously known as BotXO), for pretraining the Danish BERT. For data and training details see their GitHub repository or this article. You can also visit their organization page on Hugging Face.

It is both available in TensorFlow and Pytorch format. The original TensorFlow version can be downloaded using this link.

Here is an example on how to load Danish BERT for token classification in PyTorch using the 🤗Transformers library:

from transformers import AutoTokenizer, AutoModelForTokenClassification

tokenizer = AutoTokenizer.from_pretrained("Maltehb/danish-bert-botxo-ner-dane")
model = AutoModelForTokenClassification.from_pretrained("Maltehb/danish-bert-botxo-ner-dane")

References

Danish BERT. (2020). BotXO. https://github.com/botxo/nordic_bert (Original work published 2019)

Hvingelby, R., Pauli, A. B., Barrett, M., Rosted, C., Lidegaard, L. M., & Søgaard, A. (2020). DaNE: A Named Entity Resource for Danish. Proceedings of the 12th Language Resources and Evaluation Conference, 4597–4604. https://www.aclweb.org/anthology/2020.lrec-1.565

Contact

For help or further information feel free to connect with the author Malte Højmark-Bertelsen on hjb@kmd.dk or any of the following platforms:

MalteHB | Twitter MalteHB | LinkedIn MalteHB | Instagram