Approximately 2.5 million business names and descriptions of activities from Norwegian and Danish businesses were used to fine-tune the model. The Norwegian descriptions were translated into Danish and the Norwegian SN 2007 codes were translated into Danish DB07 codes.
Activity descriptions and business names were concatenated but separated by the separator token
</s>. Thus, the model was trained on input texts in the format
from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("erst/xlm-roberta-base-finetuned-db07") model = AutoModelForSequenceClassification.from_pretrained("erst/xlm-roberta-base-finetuned-db07") pl = pipeline( "sentiment-analysis", model=model, tokenizer=tokenizer, return_all_scores=False, ) pl("Vi sælger sko") pl("We sell clothes</s>Clothing ApS")
- Downloads last month