---
title: Spam Email Detection
emoji: 💌
colorFrom: pink
colorTo: blue
sdk: gradio
sdk_version: 3.17.0
app_file: app.py
---

# Email Spam and Phishing URL Detection

This project utilizes Naive Bayes classification to detect whether an email is spam or not, and XGBoost classification to determine if a URL within an email is phishing or legitimate.

# Getting Started
## Project Overview

The project consists of two main components:

1. **Email Spam Detection**: This component employs Naive Bayes classification to classify emails as either spam or not spam based on their content features.

2. **Phishing URL Detection**: This component uses XGBoost classification to identify whether URLs within emails are associated with phishing attempts or legitimate websites.

## Prerequisites
Make sure you have Python 3.10 installed on your system. You can download it from [](python.org)

## Requirements
Ensure you have the following dependencies installed. You can install them using `pip install -r requirements.txt`.

- gunicorn==22.0.0
- python-dateutil==2.8.2
- gradio==4.32.1
- gradio_client==0.17.0
- requests==2.31.0
- beautifulsoup4==4.12.3
- googlesearch_python==1.2.4
- urlextract==1.9.0
- numpy==1.26.3
- pandas==2.2.0
- scikit-learn==1.5.0
- urllib3==2.1.0
- python-whois==0.9.4
- xgboost==2.0.3
- lxml==5.2.2

## Setup and Installation

1. Clone the repository:

   ```bash
   git clone https://github.com/your-username/email-spam-phishing-detection.git
   cd email-spam-phishing-detection

2. Install dependencies:
   ```bash
   pip install -r requirements.txt```
   
## Usage
1. **Data Preparation:**
   - Ensure the datasets `spam.csv` and `urldata.csv` are available in the `data/` directory.

2. **Model Training:**
   - If necessary, modify and run the `notebook.ipynb` Jupyter notebook to train or fine-tune the machine learning models.
   - Trained models will be saved in the `models/` directory.

3. **Run the Application:**
   - Execute `app.py` to start the application.
   - Access the application at [Hugging Face Space](https://huggingface.co/spaces/akshatsanghvi/spam-email-detection)

## Acknowledgements

- The email spam classification model is trained using the `spam.csv` dataset, sourced from [Dataset: Spam/ham mail](https://www.kaggle.com/datasets/mfaisalqureshi/spam-email?resource=download)).
- The URL phishing detection model is trained using the `urldata.csv` dataset, sourced from [Phishing Websites Dataset](https://www.kaggle.com/datasets).

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.