Akshat Sanghvi
Update README.md
d75c366 unverified
---
title: Spam Email Detection
emoji: πŸ’Œ
colorFrom: pink
colorTo: blue
sdk: gradio
sdk_version: 3.17.0
app_file: app.py
---
# Email Spam and Phishing URL Detection
This project utilizes Naive Bayes classification to detect whether an email is spam or not, and XGBoost classification to determine if a URL within an email is phishing or legitimate.
# Getting Started
## Project Overview
The project consists of two main components:
1. **Email Spam Detection**: This component employs Naive Bayes classification to classify emails as either spam or not spam based on their content features.
2. **Phishing URL Detection**: This component uses XGBoost classification to identify whether URLs within emails are associated with phishing attempts or legitimate websites.
## Prerequisites
Make sure you have Python 3.10 installed on your system. You can download it from [](python.org)
## Requirements
Ensure you have the following dependencies installed. You can install them using `pip install -r requirements.txt`.
- gunicorn==22.0.0
- python-dateutil==2.8.2
- gradio==4.32.1
- gradio_client==0.17.0
- requests==2.31.0
- beautifulsoup4==4.12.3
- googlesearch_python==1.2.4
- urlextract==1.9.0
- numpy==1.26.3
- pandas==2.2.0
- scikit-learn==1.5.0
- urllib3==2.1.0
- python-whois==0.9.4
- xgboost==2.0.3
- lxml==5.2.2
## Setup and Installation
1. Clone the repository:
```bash
git clone https://github.com/your-username/email-spam-phishing-detection.git
cd email-spam-phishing-detection
2. Install dependencies:
```bash
pip install -r requirements.txt```
## Usage
1. **Data Preparation:**
- Ensure the datasets `spam.csv` and `urldata.csv` are available in the `data/` directory.
2. **Model Training:**
- If necessary, modify and run the `notebook.ipynb` Jupyter notebook to train or fine-tune the machine learning models.
- Trained models will be saved in the `models/` directory.
3. **Run the Application:**
- Execute `app.py` to start the application.
- Access the application at [Hugging Face Space](https://huggingface.co/spaces/akshatsanghvi/spam-email-detection)
## Acknowledgements
- The email spam classification model is trained using the `spam.csv` dataset, sourced from [Dataset: Spam/ham mail](https://www.kaggle.com/datasets/mfaisalqureshi/spam-email?resource=download)).
- The URL phishing detection model is trained using the `urldata.csv` dataset, sourced from [Phishing Websites Dataset](https://www.kaggle.com/datasets).
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.