Instructions to use nagaananth/MLOPS_group-v4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use nagaananth/MLOPS_group-v4 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="nagaananth/MLOPS_group-v4")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("nagaananth/MLOPS_group-v4") model = AutoModelForSequenceClassification.from_pretrained("nagaananth/MLOPS_group-v4") - Notebooks
- Google Colab
- Kaggle
SMS Spam Classifier — DistilBERT (Group 36, IIT Jodhpur)
Fine-tuned distilbert-base-uncased for binary SMS spam classification. Achieves 99.35% accuracy and 0.9851 F1 Macro on the held-out test set. This is v2 — the best-performing version by validation loss (0.0292).
Developed as part of the MLOps course, PGD AI Program, IIT Jodhpur.
Model Details
Model Description
- Base model:
distilbert-base-uncased(66M parameters) - Task: Binary text classification — Ham (0) vs Spam (1)
- Dataset: UCI SMS Spam Collection (5,159 samples after deduplication)
- Architecture: DistilBERT encoder + linear classification head
- Framework: PyTorch + Hugging Face Transformers
- Training platform: Kaggle (NVIDIA T4 x2 GPU)
- Developed by: MLOps Group 36, IIT Jodhpur
- Model card authors: G25AIT2032 Duggirala Vnaga Ananth
- Contact: g25ait2032@iitj.ac.in
Related Resources
| Resource | Link |
|---|---|
| GitHub Repository | MLOps Group 36 Repository |
| Kaggle Notebook (Final) | mlops-group36-final-v3 |
| W&B Dashboard | MLOPS_Group |
| HF Model — v1 | nagaananth/MLOPS_group-v1 |
| HF Model — v2 ★ Best | nagaananth/MLOPS_group-v2 |
| HF Model — v3 | nagaananth/MLOPS_group-v3 |
| HF Model — v4 | nagaananth/MLOPS_group-v4 |
| Docker Image (GHCR) | ghcr.io/g25ait2032-prog/mlops_group-inference:latest |
| Docker Image (Hub) | dvnananth/mlops-group36:v1 |
How to Get Started
from transformers import pipeline
classifier = pipeline(
"text-classification",
model="nagaananth/MLOPS_group-v2"
)
# Spam example
print(classifier("URGENT! You have won a free iPhone. Click here now."))
# [{'label': 'spam', 'score': 0.9804}]
# Ham example
print(classifier("Hey, are we still meeting for lunch at 12?"))
# [{'label': 'ham', 'score': 0.9982}]
Or with full control:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_name = "nagaananth/MLOPS_group-v2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
model.eval()
def predict(text):
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
with torch.no_grad():
logits = model(**inputs).logits
probs = torch.softmax(logits, dim=-1)[0]
pred_idx = probs.argmax().item()
return {
"label": model.config.id2label[pred_idx],
"score": round(probs[pred_idx].item(), 4)
}
print(predict("Free prize! Click now to claim your reward."))
# {'label': 'spam', 'score': 0.9897}
Training Details
Dataset
UCI SMS Spam Collection loaded via HuggingFace datasets (sms_spam).
| Split | Samples | Ham % | Spam % |
|---|---|---|---|
| Train (70%) | 3,611 | ~87.5 | ~12.5 |
| Validation (15%) | 774 | ~87.5 | ~12.5 |
| Test (15%) | 774 | ~87.5 | ~12.5 |
Preprocessing steps:
- Lowercased and whitespace normalised
- 415 duplicate messages removed (total: 5,159 unique samples)
- Stratified 70/15/15 split with zero-leakage verification
- Tokenized with
AutoTokenizerfor DistilBERT (truncation=True, max_length=128) - Labels mapped:
{"ham": 0, "spam": 1}
Hyperparameter Comparison (All Versions)
| Version | LR | Epochs | Batch Size | Warmup | Weight Decay | Early Stopping | Val Loss | F1 Macro |
|---|---|---|---|---|---|---|---|---|
| v1 | 3e-5 | 3 | 16 | 100 | 0.01 | No | 0.0539 | 0.9849 |
| v2 ★ | 2e-5 | 5 | 32 | 200 | 0.01 | Yes (p=2) | 0.0292 | 0.9851 |
| v3 | 2e-5 | 5 | 32 | 200 | 0.01 | Yes (p=2) | 0.0376 | 0.9851 |
| v4 | 1e-5 | 4 | 16 | 200 | 0.02 | Yes (p=2) | — | — |
v2 was selected as the final deployment model due to its lowest validation loss (0.0292), indicating the best generalisation.
Training Configuration (v2)
- Optimizer: AdamW
- Learning rate: 2e-5
- Epochs: 5 (with early stopping, patience=2)
- Batch size: 32 (train), 64 (eval)
- Mixed precision: fp16
- Metric for best model: F1 Weighted
- Infrastructure: Kaggle NVIDIA T4 x2 GPU
- Average training time: ~2 minutes per run
Evaluation Results
Test Set Performance (v2 — Best Model)
| Metric | Score |
|---|---|
| Accuracy | 0.9935 |
| F1 Weighted | 0.9935 |
| F1 Macro | 0.9851 |
| Precision | 0.9935 |
| Recall | 0.9935 |
| Validation Loss | 0.0292 |
Adversarial Test Cases
The model was evaluated on 15 adversarial/edge-case SMS messages covering spam, ham, and ambiguous phrasing (e.g., messages mixing casual language with spam triggers). Representative examples:
| Text | True | Predicted | Confidence |
|---|---|---|---|
| "URGENT! You have won a 1-week cruise! Call now." | spam | spam | 0.9987 |
| "You won! Click here to claim your prize." | spam | spam | 0.9945 |
| "Hey, are we still meeting for lunch at 12?" | ham | ham | 0.9991 |
| "Can you send me the report by EOD?" | ham | ham | 0.9988 |
| "Meeting for lunch? I won a contest, let's talk." | ham | ham | 0.9756 |
Inference Latency (CPU)
- Mean latency: ~30–60 ms per sample
- Suitable for CPU-only deployment
Uses
Direct Use
Binary classification of SMS or short-text messages into ham (legitimate) or spam (unsolicited/phishing). Can be directly integrated into messaging applications or notification pipelines.
Downstream Use
Can serve as a component in broader security pipelines for filtering suspicious incoming messages, or as a baseline for transfer learning to other spam-detection domains.
Out-of-Scope Use
- Long-form document classification
- Sentiment analysis or intent detection
- Legal or financial decision-making without human oversight
- Languages other than English
Bias, Risks, and Limitations
Data Bias: Trained on a specific SMS corpus from the early 2010s. May struggle with modern slang, emojis, or evolved phishing techniques not present in the training data.
False Positives: Messages containing spam-adjacent keywords (e.g., "Urgent", "Click", "Won") in legitimate contexts may be misclassified.
Contextual Blindness: Processes each message independently; cannot use conversational context from prior messages.
Phishing Sophistication: Less reliable against highly sophisticated spear-phishing that mimics professional language.
Recommendations
- Notify users when a message is flagged automatically.
- Provide a manual override/report mechanism for misclassifications.
- Monitor for distribution drift and retrain periodically on newer data.
Technical Specifications
Model Architecture
- Base:
distilbert-base-uncased(6 transformer layers, 768 hidden dim, 12 attention heads) - Classification head: Linear layer over
[CLS]token pooled output → 2 classes - Total parameters: ~66M
Compute Infrastructure
- Training: Kaggle Notebooks — NVIDIA T4 x2 GPU
- Libraries:
transformers,datasets,evaluate,accelerate,torch,wandb - Inference: CPU-compatible (no GPU required)
Environmental Impact
- Hardware: NVIDIA T4 GPU (Kaggle)
- Training duration: ~2 minutes per run
- Carbon emitted: < 0.01 kg COâ‚‚eq (estimated via ML Impact Calculator)
Citation
@misc{group36-sms-spam-2026,
author = {Duggirala Vnaga Ananth and Anukumar K and Shrikrishna Tripathi and Sudeb Ghosh},
title = {SMS Spam Classifier: Fine-tuned DistilBERT (Group 36, IIT Jodhpur)},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/nagaananth/MLOPS_group-v2}}
}
Glossary
- Ham: Legitimate, non-spam SMS message
- Spam: Unsolicited commercial or phishing message
- DistilBERT: Distilled version of BERT — 40% smaller, retains 97% of BERT's NLU performance
- F1 Macro: Unweighted mean of per-class F1 scores; useful for evaluating imbalanced datasets
- Fine-tuning: Adapting a pre-trained language model to a task-specific dataset with supervised training
- Downloads last month
- 54