Akshat Sanghvi commited on
Commit
d75c366
β€’
1 Parent(s): b9fe88f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +69 -10
README.md CHANGED
@@ -6,14 +6,73 @@ colorTo: blue
6
  sdk: gradio
7
  sdk_version: 3.17.0
8
  app_file: app.py
9
- pinned: false
10
- license: artistic-2.0
11
  ---
12
- # spam-mail-detection
13
- A simple text classifier in Python that uses the Naive Bayes model to classify e-mails as spam or ham,
14
- in other words, it used naive-bayes method to detect if a email or message is spam or not.
15
- ### What is a Spam message ?
16
- Spam is any kind of unwanted, unsolicited digital communication that gets sent out in bulk.
17
- Often spam is sent via email, but it can also be distributed via text messages, phone calls, or social media.
18
-
19
- dataset downloaded from kaggle. πŸ‘‰ https://www.kaggle.com/datasets/mfaisalqureshi/spam-email?resource=download
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  sdk: gradio
7
  sdk_version: 3.17.0
8
  app_file: app.py
 
 
9
  ---
10
+
11
+ # Email Spam and Phishing URL Detection
12
+
13
+ This project utilizes Naive Bayes classification to detect whether an email is spam or not, and XGBoost classification to determine if a URL within an email is phishing or legitimate.
14
+
15
+ # Getting Started
16
+ ## Project Overview
17
+
18
+ The project consists of two main components:
19
+
20
+ 1. **Email Spam Detection**: This component employs Naive Bayes classification to classify emails as either spam or not spam based on their content features.
21
+
22
+ 2. **Phishing URL Detection**: This component uses XGBoost classification to identify whether URLs within emails are associated with phishing attempts or legitimate websites.
23
+
24
+ ## Prerequisites
25
+ Make sure you have Python 3.10 installed on your system. You can download it from [](python.org)
26
+
27
+ ## Requirements
28
+ Ensure you have the following dependencies installed. You can install them using `pip install -r requirements.txt`.
29
+
30
+ - gunicorn==22.0.0
31
+ - python-dateutil==2.8.2
32
+ - gradio==4.32.1
33
+ - gradio_client==0.17.0
34
+ - requests==2.31.0
35
+ - beautifulsoup4==4.12.3
36
+ - googlesearch_python==1.2.4
37
+ - urlextract==1.9.0
38
+ - numpy==1.26.3
39
+ - pandas==2.2.0
40
+ - scikit-learn==1.5.0
41
+ - urllib3==2.1.0
42
+ - python-whois==0.9.4
43
+ - xgboost==2.0.3
44
+ - lxml==5.2.2
45
+
46
+ ## Setup and Installation
47
+
48
+ 1. Clone the repository:
49
+
50
+ ```bash
51
+ git clone https://github.com/your-username/email-spam-phishing-detection.git
52
+ cd email-spam-phishing-detection
53
+
54
+ 2. Install dependencies:
55
+ ```bash
56
+ pip install -r requirements.txt```
57
+
58
+ ## Usage
59
+ 1. **Data Preparation:**
60
+ - Ensure the datasets `spam.csv` and `urldata.csv` are available in the `data/` directory.
61
+
62
+ 2. **Model Training:**
63
+ - If necessary, modify and run the `notebook.ipynb` Jupyter notebook to train or fine-tune the machine learning models.
64
+ - Trained models will be saved in the `models/` directory.
65
+
66
+ 3. **Run the Application:**
67
+ - Execute `app.py` to start the application.
68
+ - Access the application at [Hugging Face Space](https://huggingface.co/spaces/akshatsanghvi/spam-email-detection)
69
+
70
+ ## Acknowledgements
71
+
72
+ - The email spam classification model is trained using the `spam.csv` dataset, sourced from [Dataset: Spam/ham mail](https://www.kaggle.com/datasets/mfaisalqureshi/spam-email?resource=download)).
73
+ - The URL phishing detection model is trained using the `urldata.csv` dataset, sourced from [Phishing Websites Dataset](https://www.kaggle.com/datasets).
74
+
75
+ ## License
76
+
77
+ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
78
+