File size: 909 Bytes
e41d125 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
# Text Classification into DB07 codes
This model is a fine-tuned xlm-roberta-base. The model is fine-tuned to classify Danish descriptions of acitivities into Dansk Branchekode DB07 codes.
## Data
Approximately 2.5 million descriptions of acitivities written by Norwegian and Danish businesses were used to fine-tune the model. The Norwegian descriptions were translated into Danish and the Norwegian SN 2007 codes were translated into Danish DB07 codes.
## Quick Start
```python
from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("CasperEriksen/xlm-roberta-base-finetuned-db07")
model = AutoModelForSequenceClassification.from_pretrained("CasperEriksen/xlm-roberta-base-finetuned-db07")
pl = pipeline(
"sentiment-analysis",
model=model,
tokenizer=tokenizer,
return_all_scores=False,
)
pl("Salg af tøj")
```
|