Andika Atmanegara Putra
add all files
6bbca31
import streamlit as st
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
from PIL import Image
st.set_page_config(
page_title='Diabetes Prediction',
layout='wide',
initial_sidebar_state='expanded'
)
def run():
# title
st.title('Diabetes Exploration')
st.subheader('Explore The Diabetes Metrics & Dataset')
# add pic
image = Image.open('diabetes.png')
st.image(image)
st.markdown('---')
markdown_text = '''
## Backgorund
Firstly, diabetes is a prevalent and chronic health condition that affects a significant portion of the population worldwide.
By providing a prediction model for diabetes, it can contribute to early detection and intervention, which is crucial in
managing the disease and preventing complications. Secondly, the integration of a diabetes prediction model in the web project
aims to enhance user experience and provide personalized health insights. Users can input their relevant health data, such as BMI,
blood glucose levels, and other factors, to obtain a prediction of their likelihood of having diabetes.
This information can empower individuals to make informed decisions about their health, seek appropriate medical attention
if necessary, and adopt preventive measures to reduce the risk of diabetes. Overall, the inclusion of a diabetes prediction
feature aligns with the objective of promoting health awareness and enabling users to take proactive steps towards their
well-being.
## Problem Statement
Using a dataset obtained from Kaggle, the goal is to build a predictive model that determines whether
individuals with specific characteristics are likely to have diabetes or not.
## Objective
The objectives of this project are to preprocess the dataset, explore its features, analyze the data,
implement four different algorithms for predicting the target variable, and perform Hyperparameter Tuning
to optimize the models' performance.
## About Dataset
| Variable | Description |
|-------------------------|-----------------------------------------------------------------------------------------------|
| Gender | Gender refers to the biological sex of the individual |
| Age | Age is an important factor as diabetes is more commonly diagnosed in older adults |
| hypertension | Hypertension is a medical condition in which the blood pressure in the arteries is |
| | persistently elevated (1 = True, 0 = False) |
| heart_disease | Heart disease is another medical condition that is associated with an increased risk of |
| | developing diabetes |
| smoking_history | Smoking history is also considered a risk factor for diabetes. |
| bmi | BMI (Body Mass Index) is a measure of body fat based on weight and height |
| HbA1c_level | HbA1c (Hemoglobin A1c) level is a measure of a person's average blood sugar level over the |
| | past 2-3 months |
| blood_glucose_level | Blood glucose level refers to the amount of glucose in the bloodstream at a given time |
| diabetes | Diabetes is the target variable being predicted (1 = True, 0 = False) |
'''
st.markdown(markdown_text)
st.markdown('---')
st.subheader('Data Exploratory')
st.markdown('---')
st.write('### Patient Information')
# show dataframe
data = pd.read_csv('diabetes_prediction_dataset.csv')
st.dataframe(data)
st.markdown('---')
# Distribusi Penderita Diabetes
fig, ax = plt.subplots()
plt.pie(data['diabetes'].value_counts(),
labels=['non-diabetic', 'diabetic'],
autopct='%1.1f%%',
colors=['Grey', 'red'],
startangle=25,
explode=[0.05, 0.05])
plt.title('Diabetes Distribution')
plt.axis('equal')
st.pyplot(fig)
'''
Based on the chart above, around 91.5% of the total 100,000 patients do
not suffer from diabetes and only **8.5%** of patients **do have diabetes**.
91.5% of total non-diabetic patients will be analyzed with health factors
to predict whether the patient or others can get diabetes or not
'''
st.markdown('---')
# visual barplot
st.subheader('Chart Based on User Input ')
st.markdown('---')
choice = st.selectbox('Pick Numeric Columns: ', ('age',
'heart_disease',
'bmi',
'HbA1c_level', 'blood_glucose_level'))
fig,ax = plt.subplots(figsize=(15,10))
sns.kdeplot(data[choice], fill=True)
ax.set_title(choice.capitalize()+' Ratio')
st.pyplot(fig)
st.markdown('---')
# visual 2
## Categorical Data Plot
pilihan_kategori = st.selectbox('Pick Category Column : ', ('gender','hypertension','smoking_history','diabetes'))
fig= plt.figure(figsize=(8, 6))
sns.countplot(data=data, x=pilihan_kategori, hue='diabetes', palette='Set2')
plt.xlabel(pilihan_kategori.capitalize())
plt.ylabel('Count')
plt.title(pilihan_kategori.capitalize()+' Ratio')
plt.legend(title='Diabetes')
st.pyplot(fig)
if __name__ == '__main__':
run()