Model Overview

This model is a fine-tuned version of BERT designed to classify SMS messages as either spam or not spam. It was developed as part of a technical test for the startup IntiGo.

Model Details

  • Model Name: BERT Fine-Tuned for SMS Spam Classification
  • Library: Transformers
  • Language: English
  • Pipeline Tag: text-classification

License

This model is released under the MIT License.

Datasets

Fine-Tuning Procedure

This model was fine-tuned on the SMS Spam Collection dataset. The dataset contains a collection of SMS messages labeled as "spam" or "ham" (not spam).

Metrics

  • Precision: 0.99
  • Recall: 0.81
  • F1 Score: 0.96

These metrics were computed on the validation set and indicate that the model is highly precise, with a strong ability to balance false positives and false negatives.

Usage

You can use this model to classify SMS messages into spam or not spam. The model accepts raw text input and outputs a label prediction.

Example:

from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Load the model and tokenizer
model_name = "Amenallah2001/intigo-technical-test"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Example input
text = "Congratulations! You've won a free ticket to Bahamas. Call now!"

# Tokenize and classify
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
logits = outputs.logits
predicted_class = logits.argmax().item()

# Output prediction
label_map = {0: "ham", 1: "spam"}
print(f"Prediction: {label_map[predicted_class]}")

Intended Use

This model is intended for detecting spam in SMS messages. It can be integrated into systems that require spam detection, such as messaging apps or SMS gateways.

Limitations

  • Data Imbalance: The dataset used for training was imbalanced, which could affect the model’s performance in real-world scenarios with different distributions of spam and non-spam messages.
  • Language Support: This model was fine-tuned on English text only and may not perform well on SMS messages in other languages.

Ethical Considerations

When using this model, be mindful of privacy concerns and ensure that the deployment complies with relevant regulations, especially in handling user-generated content.

Downloads last month
120
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Dataset used to train Amenallah2001/intigo-technical-test