Customer Support Tweet Classifier (DistilBERT)

Fine-tuned distilbert-base-uncased for routing customer support tweets into seven categories: billing, technical, account, delivery, product, support, general.

Trained on 50,000 stratified examples from the Customer Support on Twitter dataset.

Results

  • Accuracy: 99.5%
  • Macro F1: 0.991
  • Weighted F1: 0.994

Evaluated on a held-out test set of 307,569 tweets. The full analysis and side-by-side comparison with a TF-IDF + Logistic Regression baseline lives in the GitHub repository.

Usage

import pickle
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from huggingface_hub import hf_hub_download

REPO = "Vishesh062/customer-support-tweet-classifier"
tokenizer = AutoTokenizer.from_pretrained(REPO)
model = AutoModelForSequenceClassification.from_pretrained(REPO)
model.eval()

# Label encoder maps class indices to category names
le_path = hf_hub_download(repo_id=REPO, filename="label_encoder.pkl")
with open(le_path, "rb") as f:
    le = pickle.load(f)

def classify(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=128)
    with torch.no_grad():
        logits = model(**inputs).logits
    return le.inverse_transform([logits.argmax(-1).item()])[0]

classify("I've been overcharged on my last bill")
# โ†’ 'billing'

Limitations

The training labels are synthetic โ€” generated by keyword matching, not human annotation. This model is therefore a keyword detector with extra steps. Real-world deployment needs human-annotated training data to be meaningful.

See the GitHub repository for the full methodology, baseline comparison, error analysis, and known limitations.

Citation

If you use this model, please link to the main repository.

Downloads last month
34
Safetensors
Model size
67M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support