YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

BERT Base Uncased Quantized Model for Spam Detection

This repository hosts a quantized version of the BERT model, fine-tuned for spam detection tasks. The model has been optimized for efficient deployment while maintaining high accuracy, making it suitable for resource-constrained environments.

Model Details

  • Model Architecture: BERT Base Uncased
  • Task: Spam Email Detection
  • Dataset: Hugging Face's mail_spam_ham_dataset and 'spam-mail'
  • Quantization: Float16
  • Fine-tuning Framework: Hugging Face Transformers

Usage

Installation

pip install transformers torch

Loading the Model

from transformers import BertTokenizer, BertForSequenceClassification
import torch

model_name = "AventIQ-AI/bert-spam-detection"
tokenizer = BertTokenizer.from_pretrained(model_name)
model = BertForSequenceClassification.from_pretrained(model_name)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

def predict_spam_quantized(text):
    """Predicts whether a given text is spam (1) or ham (0) using the quantized BERT model."""
    
    # Tokenize input text
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)

    # Move inputs to GPU (if available)
    inputs = {key: value.to(device) for key, value in inputs.items()}
    
    # Perform inference
    with torch.no_grad():
        outputs = model(**inputs)

    # Get predicted label (0 = ham, 1 = spam)
    prediction = torch.argmax(outputs.logits, dim=1).item()
    
    return "Spam" if prediction == 1 else "Ham"


# Sample test messages
print(predict_spam_quantized("WINNER!! As a valued network customer you have been selected to receivea £900 prize reward! To claim call 09061701461. Claim code KL341. Valid 12 hours only."))
# Expected output: Spam

print(predict_spam_quantized("WINNER!! As a valued network customer you have been selected to receivea £900 prize reward! To claim call 09061701461. Claim code KL341. Valid 12 hours only."))
# Expected output: Ham

πŸ“Š Classification Report (Quantized Model - float16)

Metric Class 0 (Non-Spam) Class 1 (Spam) Macro Avg Weighted Avg
Precision 1.00 0.98 0.99 0.99
Recall 0.99 0.99 0.99 0.99
F1-Score 0.99 0.99 0.99 0.99
Accuracy 99% 99% 99% 99%

πŸ” Observations

βœ… Precision: High (1.00 for non-spam, 0.98 for spam) β†’ Few false positives
βœ… Recall: High (0.99 for both classes) β†’ Few false negatives
βœ… F1-Score: Near-perfect balance between precision & recall

Fine-Tuning Details

Dataset

The Hugging Face's 'spam-mail' and 'mail_spam_ham_dataset' datasets are combined together and used, containing both spam and ham (non-spam) examples.

Training

  • Number of epochs: 3
  • Batch size: 8
  • Evaluation strategy: epoch
  • Learning rate: 2e-5

Quantization

Post-training quantization was applied using PyTorch's built-in quantization framework to reduce the model size and improve inference efficiency.

Repository Structure

.
β”œβ”€β”€ model/               # Contains the quantized model files
β”œβ”€β”€ tokenizer_config/    # Tokenizer configuration and vocabulary files
β”œβ”€β”€ model.safetensors/   # Fine Tuned Model
β”œβ”€β”€ README.md            # Model documentation

Limitations

  • The model may not generalize well to domains outside the fine-tuning dataset.
  • Quantization may result in minor accuracy degradation compared to full-precision models.

Contributing

Contributions are welcome! Feel free to open an issue or submit a pull request if you have suggestions or improvements.

Downloads last month
331
Safetensors
Model size
109M params
Tensor type
FP16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Space using AventIQ-AI/bert-spam-detection 1