--- tags: - sklearn - text-classification language: - nl metrics: - accuracy - hamming-loss --- # Model card for NOS Drug-Related Text Classification on Telegram The NOS editorial team is conducting an investigation into drug-related messages on Telegram. Thousands of Telegram messages has been labeled as drugs-related content (or not), as well including detail regarding the specific type of drugs, and delivery method. The data is utilized in order to train a model to scale it up and automatically label millions more. ## Methodology Primarily a Logistic Regression model has been trained for binary classification. Text data was converted to numeric values using the Tfidf Vectorizer, considering term frequency-inverse document frequency (TF-IDF). This transformation enables the model to learn patterns and relationships between words. The model achieved 97% accuracy on the test set. To take tasks with multiple possible labels into consideration, a MultiOutputClassifier was employed as an extension. This addresses the complexity of associating a text message with multiple categories such as "soft drugs," "hard drugs," and "medicines”. One-Hot Encoding was used for multi-label transformation. Performance evaluation utilized Hamming Loss, a metric suitable for multi-label classification. The model demonstrated a Hamming Loss of 0.04, indicating 96% accuracy per label. ### Tools used to train the model • Python • scikit-learn • pandas • numpy ### How to Get Started with the Model Use the code below to get started with the model. ```python from joblib import load # load the model clf = load('model.joblib') # make some predictions text_messages = [ """ Oud kleding te koop! Stuur een berichtje We repareren ook! """, """ COKE/XTC * 1Gram = €50 * 5Gram = €230 """] mapping = {0:"bezorging", 1:"bulk", 2:"designer", 3:"drugsad", 4:"geendrugsad", 5:"harddrugs", 6:"medicijnen", 7: "pickup", 8: "post", 9:"softdrugs"} labels = [] for message in clf.predict(text_messages): label = [] for idx, labeled in enumerate(message): if labeled == 1: label.append(mapping[idx]) labels.append(label) print(labels) ``` ## Details - **Shared by** Dutch Public Broadcasting Foundation (NOS) - **Model type:** text-classification - **Language:** Dutch - **License:** Creative Commons Attribution Non Commercial No Derivatives 4.0