Spaces:
Build error
Build error
from joblib import load | |
from sklearn.feature_extraction.text import TfidfVectorizer | |
import numpy as np | |
import streamlit as st | |
info = [ | |
{"title": "NAME", "detail": "AKINBITAN TAIWO EMMANUEL"}, | |
{"title": "MATRIC NO", "detail": "HNDCOM/22/032"}, | |
{"title": "CLASS", "detail": "HND2"}, | |
{"title": "LEVEL", "detail": "400L"}, | |
{"title": "PROJECT SUPERVISOR", "detail": ""}, | |
] | |
st.title("Project Information") | |
for item in info: | |
st.write(f"{item['title']}: {item['detail']}") | |
st.image('fcahpt.jpg', caption='federal college of animal health and production technology') | |
st.header('Spam Detection using Naive Bayes Classifier') | |
st.write('This is spam detection developed with python using Naive Bayes Classifier') | |
vectorizer = load('tfidf_vectorizer.joblib') | |
user_input = st.text_area("Enter some text:", "") | |
if user_input is not None: | |
x = vectorizer.transform([user_input]) | |
model = load('Naive_Bayes_Spam_Detection.joblib') | |
pred = model.predict(x) | |
if pred[0] == 1: | |
st.markdown("<b>Prediction: <span style='color:red'>The entered text is likey to be a Spam, be careful </span></b>", unsafe_allow_html=True) | |
elif pred[0] == 0: | |
st.markdown("<b>Prediction: <span style='color:green'>The entered text is not a Spam and safe</span></b>", unsafe_allow_html=True) | |
else: | |
st.write('Error, Try again') | |
st.header("Project Description") | |
st.markdown(""" | |
Spam Detection using Naive Bayes Classifier is a classic and effective approach for automatically identifying spam emails or messages. | |
In a comprehensive approach of how it works; | |
""") | |
st.header("1. Data Collection and Preprocessing:") | |
st.markdown(""" | |
- The process begins with collecting a dataset of emails or messages labeled as spam or non-spam (ham). | |
- Each message undergoes preprocessing steps such as removing HTML tags, punctuation, and stopwords (commonly occurring words like "and", "the", etc.). | |
- The text is then tokenized and transformed into numerical representations using techniques like TF-IDF (Term Frequency-Inverse Document Frequency) or Count Vectorization. | |
""") | |
st.header("2. Understanding Naive Bayes Classifier:") | |
st.markdown(""" | |
- Naive Bayes is a probabilistic classification algorithm based on Bayes' theorem, which calculates the probability of a certain event happening given the occurrence of another event. | |
- The "naive" assumption in Naive Bayes is that the features are conditionally independent given the class label. This simplifies the calculation and makes the algorithm computationally efficient. | |
""") | |
st.header("3. Training the Naive Bayes Model:") | |
st.markdown(""" | |
- The dataset is split into training and testing sets. | |
- During training, the Naive Bayes classifier learns the probability distribution of words or features given each class (spam or ham). | |
- It calculates the prior probabilities of spam and ham messages and the likelihood probabilities of each word occurring in spam and ham messages. | |
- These probabilities are estimated from the training data using maximum likelihood estimation or other smoothing techniques. | |
""") | |
st.header("4. Classification:") | |
st.markdown(""" | |
- Once the model is trained, it can classify new, unseen messages. | |
- Given a new message, the classifier calculates the probability that it belongs to each class (spam or ham) using Bayes' theorem. | |
- The final classification decision is based on the class with the highest probability. If the probability of a message being spam is higher than a predefined threshold, it's classified as spam; otherwise, it's classified as ham. | |
""") | |
st.header("5. Model Evaluation:") | |
st.markdown(""" | |
- The performance of the Naive Bayes classifier is evaluated using metrics such as accuracy, precision, recall, and F1-score on a separate test dataset. | |
- These metrics help assess how well the model generalizes to unseen data and its effectiveness in distinguishing between spam and non-spam messages. | |
""") | |
st.header("6. Deployment and Fine-Tuning:") | |
st.markdown(""" | |
- Once the model is trained and evaluated, it can be deployed for real-world use. | |
- Deployment may involve integrating the model into email systems or messaging platforms to automatically filter spam messages. | |
- Periodic updates and fine-tuning of the model may be necessary to adapt to changing spamming techniques and patterns. | |
""") |