Customer Support Tweet Classifier (DistilBERT)

Fine-tuned distilbert-base-uncased for routing customer support tweets into seven categories: billing, technical, account, delivery, product, support, general.

Trained on 50,000 stratified examples from the Customer Support on Twitter dataset.

Results

Accuracy: 99.5%
Macro F1: 0.991
Weighted F1: 0.994

Evaluated on a held-out test set of 307,569 tweets. The full analysis and side-by-side comparison with a TF-IDF + Logistic Regression baseline lives in the GitHub repository.

Usage

import pickle
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from huggingface_hub import hf_hub_download

REPO = "Vishesh062/customer-support-tweet-classifier"
tokenizer = AutoTokenizer.from_pretrained(REPO)
model = AutoModelForSequenceClassification.from_pretrained(REPO)
model.eval()

# Label encoder maps class indices to category names
le_path = hf_hub_download(repo_id=REPO, filename="label_encoder.pkl")
with open(le_path, "rb") as f:
    le = pickle.load(f)

def classify(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=128)
    with torch.no_grad():
        logits = model(**inputs).logits
    return le.inverse_transform([logits.argmax(-1).item()])[0]

classify("I've been overcharged on my last bill")
# → 'billing'

Limitations

The training labels are synthetic — generated by keyword matching, not human annotation. This model is therefore a keyword detector with extra steps. Real-world deployment needs human-annotated training data to be meaningful.

See the GitHub repository for the full methodology, baseline comparison, error analysis, and known limitations.

Citation

If you use this model, please link to the main repository.

Downloads last month: 34

Safetensors

Model size

67M params

Tensor type

F32