CasperEriksen commited on
Commit
cbf3afb
1 Parent(s): 95f26d8

Add README.md

Browse files
Files changed (1) hide show
  1. README.md +31 -0
README.md ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Classifying Text into DB07 Codes
2
+
3
+ This model is [xlm-roberta-base](https://huggingface.co/xlm-roberta-base) fine-tuned to classify descriptions of activities into [NACE Rev. 2](https://ec.europa.eu/eurostat/web/nace-rev2) codes.
4
+
5
+
6
+ ## Data
7
+ The data used to fine-tune the model consist of 2.5 million descriptions of activities from Norwegian and Danish businesses. To improve the model's multilingual performance, random samples were machine translated into the following languages:
8
+ - English
9
+ - German
10
+ - Spanish
11
+ - French
12
+ - Finnish
13
+
14
+
15
+ ## Quick Start
16
+
17
+ ```python
18
+ from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification
19
+
20
+ tokenizer = AutoTokenizer.from_pretrained("erst/xlm-roberta-base-finetuned-db07")
21
+ model = AutoModelForSequenceClassification.from_pretrained("erst/xlm-roberta-base-finetuned-db07")
22
+
23
+ pl = pipeline(
24
+ "sentiment-analysis",
25
+ model=model,
26
+ tokenizer=tokenizer,
27
+ return_all_scores=False,
28
+ )
29
+
30
+ pl("We sell clothes")
31
+ ```