Danish BERT fine-tuned for Sentiment Analysis with senda

This model detects polarity ('positive', 'neutral', 'negative') of Danish texts.

It is trained and tested on Tweets annotated by Alexandra Institute. The model is trained with the senda package.

Here is an example of how to load the model in PyTorch using the 🤗Transformers library:

from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
tokenizer = AutoTokenizer.from_pretrained("pin/senda")
model = AutoModelForSequenceClassification.from_pretrained("pin/senda")

# create 'senda' sentiment analysis pipeline 
senda_pipeline = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer)

text = "Sikke en dejlig dag det er i dag"
# in English: 'what a lovely day'
senda_pipeline(text)

Performance

The senda model achieves an accuracy of 0.77 and a macro-averaged F1-score of 0.73 on a small test data set, that Alexandra Institute provides. The model can most certainly be improved, and we encourage all NLP-enthusiasts to give it their best shot - you can use the senda package to do this.

Contact

Feel free to contact author Lars Kjeldgaard on lars.kjeldgaard@eb.dk.

Shout-outs

Props to Malte Højmark-Berthelsen for pretraining Danish BERT and helping out adding a TensorFlow backend for senda.

Downloads last month
1,162
Hosted inference API
Text Classification
This model can be loaded on the Inference API on-demand.