EcomIntent — E-commerce Support Ticket Intent Classifier
Fine-tuned DistilBERT for classifying e-commerce customer support tickets into 9 intent categories. Beats GPT-4o mini on this task at 15x lower cost and 50x lower latency.
Live API: RapidAPI Listing | GitHub: ecomintent-api
Model Description
EcomIntent is a DistilBERT-base model fine-tuned on two Bitext e-commerce/support datasets augmented with Claude-generated synthetic examples. It classifies short English text (support tickets, chat messages, emails) into one of 9 intent categories specific to e-commerce customer support.
Base model: distilbert-base-uncased
Task: Multi-class text classification (9 classes)
Language: English
License: Apache 2.0
Intended Uses
- Primary use: Routing incoming support tickets to the correct queue or agent
- Secondary uses:
- Triggering automated responses (WISMO → send tracking link)
- Analytics and intent volume dashboards
- Gorgias / Zendesk / Shopify Inbox integrations
- Preprocessing before LLM-based response generation
Out-of-Scope Uses
- Non-English support tickets (model is English-only)
- Multi-intent messages — v1 assigns a single intent; if a message contains multiple intents, the highest-confidence one is returned
- Sensitive classification tasks (medical, legal, financial decisions)
- Domains outside e-commerce customer support
Intent Taxonomy
| Label | Description | Example |
|---|---|---|
WISMO |
Where is my order / tracking / delivery status | "where is my package, it's been 5 days" |
RETURN_REQUEST |
Customer wants to return for refund | "I need to return these shoes for a refund" |
EXCHANGE_REQUEST |
Customer wants different size/color/variant | "can I swap this for a size large?" |
CANCEL_ORDER |
Cancel before shipment | "please cancel order #12345 immediately" |
DAMAGED_ITEM |
Broken, wrong, or missing item arrived | "my item arrived completely smashed" |
BILLING_DISPUTE |
Charge issues, refund status, payment problems | "I was charged twice for the same order" |
PRODUCT_QUESTION |
Specs, sizing, compatibility, availability | "does this fit a 2022 MacBook Pro?" |
ACCOUNT_ISSUE |
Login, password, account access | "I can't log into my account" |
OTHER |
Catch-all — OOS, greetings, spam | "hi there, quick question" |
Training Data
| Source | Examples | License |
|---|---|---|
| Bitext Retail Ecommerce | ~26,000 | CDLA-Sharing 1.0 |
| Bitext Customer Support | ~27,000 | CDLA-Sharing 1.0 |
| Synthetic (Claude claude-sonnet-4-6) | ~2,250 | Apache 2.0 |
Bitext source labels were mapped to the 9-class taxonomy. Synthetic data was generated via Claude API to fill gaps in underrepresented classes (EXCHANGE_REQUEST, OTHER, PRODUCT_QUESTION).
Train/Val/Test split: 80/10/10, stratified by class.
Training Procedure
Hyperparameters
| Parameter | Value |
|---|---|
| Base model | distilbert-base-uncased |
| Epochs | 8 (with early stopping, patience=3) |
| Batch size | 32 |
| Learning rate | 2e-5 |
| LR scheduler | Cosine |
| Warmup ratio | 0.1 |
| Weight decay | 0.01 |
| Max token length | 128 |
| Mixed precision | fp16 |
| Hardware | NVIDIA RTX 4080 16GB |
| Training time | ~25–40 min |
Evaluation Results
Test Set Performance
| Model | Accuracy | Weighted F1 | P95 Latency | Cost/1k calls |
|---|---|---|---|---|
| EcomIntent (ours) | 99.92% | 0.9992 | ~8ms | ~$0.001 |
| GPT-4o mini (zero-shot) | 84.5% | 0.840 | ~450ms | $0.015 |
| GPT-4o mini (5-shot) | 88.0% | 0.875 | ~700ms | $0.045 |
| Forethought Triage | ~88.5% | ~0.880 | ~300ms | $30k+/yr flat |
Per-Class F1
| Intent | F1 | Precision | Recall |
|---|---|---|---|
| WISMO | 0.9989 | 1.0000 | 0.9979 |
| RETURN_REQUEST | 1.0000 | 1.0000 | 1.0000 |
| EXCHANGE_REQUEST | 1.0000 | 1.0000 | 1.0000 |
| CANCEL_ORDER | 0.9979 | 0.9958 | 1.0000 |
| DAMAGED_ITEM | 0.9989 | 0.9979 | 1.0000 |
| BILLING_DISPUTE | 0.9985 | 1.0000 | 0.9970 |
| PRODUCT_QUESTION | 1.0000 | 1.0000 | 1.0000 |
| ACCOUNT_ISSUE | 0.9995 | 0.9989 | 1.0000 |
| OTHER | 0.9984 | 0.9979 | 0.9989 |
Note on evaluation methodology: These numbers are measured on a held-out 10% test split of the Bitext retail + support datasets, which are themselves synthetically generated from a fixed template inventory. Train and test share the same template distribution. Real-world accuracy on production e-commerce tickets is estimated at 87–93% based on the generalization gap observed during development. We recommend benchmarking against a sample of your own ticket data before relying on these figures for SLA commitments.
Confusion Matrix
How to Use
With the Hosted API (Recommended)
import requests
response = requests.post(
"https://YOUR_MODAL_ENDPOINT/classify",
json={"text": "where is my order, it's been 5 days"}
)
result = response.json()
print(result["intent"]) # "WISMO"
print(result["confidence"]) # 0.9872
With the Transformers Library
from transformers import pipeline
classifier = pipeline(
"text-classification",
model="JohnBolognino/ecomintent-distilbert",
top_k=None,
)
result = classifier("where is my order it's been a week")
# Returns list of {label, score} for all 9 classes
top = max(result[0], key=lambda x: x["score"])
print(top["label"]) # "WISMO"
print(top["score"]) # 0.9872
Batch Inference
from transformers import pipeline
classifier = pipeline(
"text-classification",
model="JohnBolognino/ecomintent-distilbert",
device=0, # GPU
top_k=1,
)
tickets = [
"where is my package",
"I need to return this item",
"cancel my order please",
]
results = classifier(tickets, batch_size=32)
for ticket, result in zip(tickets, results):
print(f"{result[0]['label']}: {ticket}")
Limitations
- English only — performance degrades significantly on non-English text
- Single intent — one prediction per message; multi-intent messages get the dominant class
- Short texts — optimized for support tickets (3–200 words); very long documents may truncate
- Domain-specific — trained on e-commerce data; may underperform on other support domains (SaaS, healthcare, etc.)
- Training data bias — Bitext datasets are synthetic themselves; real-world distributions may differ slightly
Environmental Impact
Training was performed on a single NVIDIA RTX 4080 GPU for approximately 35 minutes. Estimated CO2 emissions: < 0.05 kg CO2 (negligible).
Citation
If you use this model in research, please cite:
@misc{ecomintent2024,
title={EcomIntent: Fine-tuned DistilBERT for E-commerce Intent Classification},
author={YOUR_NAME},
year={2024},
url={https://huggingface.co/JohnBolognino/ecomintent-distilbert}
}
License
Apache 2.0 — free for commercial use.
- Downloads last month
- 16
Datasets used to train JohnBolognino/ecomintent-distilbert
bitext/Bitext-retail-ecommerce-llm-chatbot-training-dataset
Evaluation results
- Weighted F1self-reported0.999
- Accuracyself-reported99.920
