Spam Detection System

Lite Model

Introduction

The Lite model is a streamlined approach with optimized parameters and enhanced feature extraction designed for quick and efficient spam detection.

Features

  • Text Preprocessing: Lemmatization, removal of stop words and punctuation.
  • Feature Extraction: Text length, word count, unique word count, uppercase count, special character count.
  • Model Creation: Ensemble model using SVC, MultinomialNB, and ExtraTreesClassifier.
  • Visualization: Generates graphs for dataset insights, word clouds, and performance metrics.
  • Metrics Saving: Accuracy, precision, and F1 score.

How to Run

  1. Train the Model:
    python training/train_model_lite.py
    
  2. Use the Model:
    import joblib
    model = joblib.load('models/model.pkl')
    vectorizer = joblib.load('models/vectorizer.pkl')
    

Legacy Model

Introduction

The Legacy model retains the original model logic without optimization but updates the structure and adds visualizations for spam detection.

Features

  • Text Preprocessing: Porter Stemming, removal of stop words and punctuation.
  • Model Creation: Ensemble model using SVC, MultinomialNB, and ExtraTreesClassifier with original parameters.
  • Visualization: Generates graphs for dataset insights, word clouds, and performance metrics.
  • Metrics Saving: Accuracy and precision.

How to Run

  1. Train the Model:
    python training/train_model_legacy.py
    
  2. Use the Model:
    import joblib
    model = joblib.load('models/model.pkl')
    vectorizer = joblib.load('models/vectorizer.pkl')
    

Additional Information

  • Dependencies: Python 3.6 or higher, pip, and required packages listed in requirements.txt.
  • Dataset: The dataset used for training is spam.csv.
  • Contact and Support: For questions or support, please contact the project maintainers.

For more details, you can refer to the README.md and models.md.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model's library. Check the docs .

Dataset used to train arkodeep/spam-classfication-model

Collection including arkodeep/spam-classfication-model