Text Classification into DB07 codes

This model is a fine-tuned xlm-roberta-base. The model is fine-tuned to classify Danish descriptions of acitivities into Dansk Branchekode DB07 codes.

Data

Approximately 2.5 million descriptions of acitivities written by Norwegian and Danish businesses were used to fine-tune the model. The Norwegian descriptions were translated into Danish and the Norwegian SN 2007 codes were translated into Danish DB07 codes.

Quick Start

from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("CasperEriksen/xlm-roberta-base-finetuned-db07")
model = AutoModelForSequenceClassification.from_pretrained("CasperEriksen/xlm-roberta-base-finetuned-db07")

pl = pipeline(
    "sentiment-analysis",
    model=model,
    tokenizer=tokenizer,
    return_all_scores=False,
)

pl("Salg af tøj")