nosdigitalmedia commited on
Commit
813ae9b
1 Parent(s): c78b4ec

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +68 -1
README.md CHANGED
@@ -1,3 +1,70 @@
1
  ---
2
- license: apache-2.0
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ tags:
3
+ - sklearn
4
+ - text-classification
5
+ language:
6
+ - nl
7
+ metrics:
8
+ - accuracy
9
+ - hamming-loss
10
  ---
11
+
12
+
13
+ # Model card for NOS Drug-Related Text Classification on Telegram
14
+ The NOS editorial team is conducting an investigation into drug-related messages on Telegram. Thousands of Telegram messages has been labeled as drugs-related content (or not), as well including detail regarding the specific type of drugs, and delivery method. The data is utilized in order to train a model to scale it up and automatically label millions more.
15
+
16
+ ## Methodology
17
+ Primarily a Logistic Regression model has been trained for binary classification. Text data was converted to numeric values using the Tfidf Vectorizer, considering term frequency-inverse document frequency (TF-IDF). This transformation enables the model to learn patterns and relationships between words. The model achieved 97% accuracy on the test set.
18
+ To take tasks with multiple possible labels into consideration, a MultiOutputClassifier was employed as an extension. This addresses the complexity of associating a text message with multiple categories such as "soft drugs," "hard drugs," and "medicines”. One-Hot Encoding was used for multi-label transformation.
19
+ Performance evaluation utilized Hamming Loss, a metric suitable for multi-label classification. The model demonstrated a Hamming Loss of 0.04, indicating 96% accuracy per label.
20
+
21
+ ### Tools used to train the model
22
+ • Python
23
+ • scikit-learn
24
+ • pandas
25
+ • numpy
26
+
27
+ ### How to Get Started with the Model
28
+
29
+ Use the code below to get started with the model.
30
+
31
+ ```python
32
+ from joblib import load
33
+
34
+ # load the model
35
+ clf = load('model.joblib')
36
+
37
+ # make some predictions
38
+
39
+ text_messages = [
40
+ """
41
+ Oud kleding te koop! Stuur een berichtje
42
+ We repareren ook!
43
+ """,
44
+
45
+ """
46
+ COKE/XTC
47
+ * 1Gram = €50
48
+ * 5Gram = €230
49
+ """]
50
+
51
+ mapping = {0:"bezorging", 1:"bulk", 2:"designer", 3:"drugsad", 4:"geendrugsad", 5:"harddrugs", 6:"medicijnen", 7: "pickup", 8: "post", 9:"softdrugs"}
52
+
53
+ labels = []
54
+
55
+ for message in clf.predict(text_messages):
56
+ label = []
57
+ for idx, labeled in enumerate(message):
58
+ if labeled == 1:
59
+ label.append(mapping[idx])
60
+ labels.append(label)
61
+
62
+ print(labels)
63
+
64
+ ```
65
+
66
+ ## Details
67
+ - **Shared by** Dutch Public Broadcasting Foundation (NOS)
68
+ - **Model type:** text-classification
69
+ - **Language:** Dutch
70
+ - **License:** Creative Commons Attribution Non Commercial No Derivatives 4.0