metadata

language: da
tags:
  - danish
  - bert
  - sentiment
  - text-classification
  - Maltehb/danish-bert-botxo
  - Helsinki-NLP/opus-mt-en-da
  - go-emotion
  - Certainly
license: cc-by-4.0
datasets:
  - go_emotions
metrics:
  - Accuracy
widget:
  - text: Det er så sødt af dig at tænke på andre på den måde ved du det?
  - text: Jeg vil gerne have en playstation.
  - text: Jeg elsker dig
  - text: Hvordan håndterer jeg min irriterende nabo?

Danish-Bert-GoÆmotion

Danish Go-Emotions classifier. Maltehb/danish-bert-botxo (uncased) finetuned on a translation of the go_emotions dataset using Helsinki-NLP/opus-mt-en-da. Thus, performance is obviousely dependent on the translation model.

Training

Translating the training data with MT: Notebook
Fine-tuning danish-bert-botxo: coming soon...

Training Parameters:

Num examples = 189900
Num Epochs = 3
Train batch = 8
Eval batch = 8
Learning Rate = 3e-5
Warmup steps = 4273
Total optimization steps = 71125

Loss

Training loss

Eval. loss

0.1178 (21100 examples)

Using the model with `transformers`

Easiest use with transformers and pipeline:

from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

model = AutoModelForSequenceClassification.from_pretrained('RJuro/Da-HyggeBERT')
tokenizer = AutoTokenizer.from_pretrained('RJuro/Da-HyggeBERT')

classifier = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)

classifier('jeg elsker dig')

[{'label': 'kærlighed', 'score': 0.9634820818901062}]

Using the model with `simpletransformers`

from simpletransformers.classification import MultiLabelClassificationModel

model = MultiLabelClassificationModel('bert', 'RJuro/Da-HyggeBERT')

predictions, raw_outputs = model.predict(df['text'])