File size: 1,049 Bytes
7aa08f7 cbf3afb 743db22 cbf3afb 84d9e5e cbf3afb 743db22 cbf3afb fcdda36 cbf3afb |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
# Classifying Text into NACE Codes
This model is [xlm-roberta-base](https://huggingface.co/xlm-roberta-base) fine-tuned to classify descriptions of activities into [NACE Rev. 2](https://ec.europa.eu/eurostat/web/nace-rev2) codes.
## Data
The data used to fine-tune the model consist of 2.5 million descriptions of activities from Norwegian and Danish businesses. To improve the model's multilingual performance, random samples of the Norwegian and Danish descriptions were machine translated into the following languages:
- English
- German
- Spanish
- French
- Finnish
- Polish
## Quick Start
```python
from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("erst/xlm-roberta-base-finetuned-nace")
model = AutoModelForSequenceClassification.from_pretrained("erst/xlm-roberta-base-finetuned-nace")
pl = pipeline(
"sentiment-analysis",
model=model,
tokenizer=tokenizer,
return_all_scores=False,
)
pl("The purpose of our company is to build houses")
```
|