Spam Detection System
Lite Model
Introduction
The Lite model is a streamlined approach with optimized parameters and enhanced feature extraction designed for quick and efficient spam detection.
Features
- Text Preprocessing: Lemmatization, removal of stop words and punctuation.
- Feature Extraction: Text length, word count, unique word count, uppercase count, special character count.
- Model Creation: Ensemble model using SVC, MultinomialNB, and ExtraTreesClassifier.
- Visualization: Generates graphs for dataset insights, word clouds, and performance metrics.
- Metrics Saving: Accuracy, precision, and F1 score.
How to Run
- Train the Model:
python training/train_model_lite.py
- Use the Model:
import joblib
model = joblib.load('models/model.pkl')
vectorizer = joblib.load('models/vectorizer.pkl')
Legacy Model
Introduction
The Legacy model retains the original model logic without optimization but updates the structure and adds visualizations for spam detection.
Features
- Text Preprocessing: Porter Stemming, removal of stop words and punctuation.
- Model Creation: Ensemble model using SVC, MultinomialNB, and ExtraTreesClassifier with original parameters.
- Visualization: Generates graphs for dataset insights, word clouds, and performance metrics.
- Metrics Saving: Accuracy and precision.
How to Run
- Train the Model:
python training/train_model_legacy.py
- Use the Model:
import joblib
model = joblib.load('models/model.pkl')
vectorizer = joblib.load('models/vectorizer.pkl')
Additional Information
- Dependencies: Python 3.6 or higher, pip, and required packages listed in
requirements.txt
.
- Dataset: The dataset used for training is
spam.csv
.
- Contact and Support: For questions or support, please contact the project maintainers.
For more details, you can refer to the README.md and models.md.