import streamlit as st import pandas as pd import os from PIL import Image import matplotlib.pyplot as plt import seaborn as sns import numpy as np st.set_page_config(layout="wide") st.title("๐Ÿฉบ Diabetic Retinopathy Project") # Tabs tab1, tab2, tab3 = st.tabs(["๐Ÿ“‚ Dataset Info", "๐Ÿ“Š Training Visualization", "๐Ÿค– Algorithm Used"]) # ============================= # Tab 1: Dataset Information # ============================= with tab1: st.markdown(""" ### ๐Ÿงพ Dataset Overview **Dataset Description:** The DDR dataset contains **13,673 fundus images** from **147 hospitals** across **23 provinces in China**. The images are labeled into 5 classes based on DR severity: - **No_DR** - **Mild** - **Moderate** - **Severe** - **Proliferative_DR** Poor-quality images were removed, and black backgrounds were deleted. **12,521 images left** [๐Ÿ“Ž Dataset source](https://www.kaggle.com/datasets/mariaherrerot/ddrdataset) ### ๐Ÿงช Data Preparation & Splitting - All images resized to **224x224** - **70% Training**, **30% Testing** (stratified by class) """) # ============================= # Tab 2: Training Visualization # ============================= with tab2: st.markdown("### ๐Ÿ“Š Training Data Class Distribution") # CSV path and image folder path (adjust as needed) CSV_PATH = "./dataset/DR_grading.csv" IMG_FOLDER = "./dataset/images" # Folder where all images are stored # Load CSV df = pd.read_csv(CSV_PATH) # Map the 'diagnosis' column to 'label' if it's numeric (e.g., 0 to 4) label_map = { 0: "No_DR", 1: "Mild", 2: "Moderate", 3: "Severe", 4: "Proliferative_DR" } df['label'] = df['diagnosis'].map(label_map) # --- Metric 1: Full Dataset Table --- st.subheader("3๏ธโƒฃ Full Dataset Table") st.dataframe(df, use_container_width=True) # --- Metric 2: Class Distribution --- st.subheader("1๏ธโƒฃ Class Distribution") class_counts = df['label'].value_counts().reset_index() class_counts.columns = ['Class', 'Count'] fig1, ax1 = plt.subplots() sns.barplot(data=class_counts, x='Class', y='Count', palette='rocket', ax=ax1) ax1.set_title("Class Distribution") st.pyplot(fig1) # --- Metric 3: Sample Images Per Class --- st.subheader("2๏ธโƒฃ Sample Images Per Class") cols = st.columns(len(class_counts)) for i, label in enumerate(class_counts['Class']): sample_row = df[df['label'] == label].iloc[0] # Get first image of this class img_path = os.path.join(IMG_FOLDER, sample_row['id_code']) # Assuming image filenames are id_code.png if os.path.exists(img_path): image = Image.open(img_path) cols[i].image(image, caption=label, use_container_width=True) else: cols[i].write(f"Image not found: {sample_row['id_code']}") # ============================= # Tab 3: Algorithm Used # ============================= with tab3: st.markdown(""" ### ๐Ÿค– Model and Algorithm We used **Transfer Learning** with **DenseNet121** for DR classification. #### ๐Ÿ—๏ธ Model Details: - Model: **DenseNet121** (pretrained on **ImageNet**) - Input Image Size: **224x224** - Batch Size: **32** - Optimizer: **AdamW** (learning rate = **1e-3**) - Loss Function: **Categorical Crossentropy** - Evaluation Metrics: **Accuracy**, **Precision**, **Recall** #### ๐Ÿ“Š Evaluation Results: - **Top-1 Accuracy:** 85.0% - **Top-2 Accuracy:** 84.9% - **Top-3 Accuracy:** 84.6% #### ๐Ÿ–ฅ๏ธ Training Environment: - **Operating System:** Windows - **Hardware:** CPU only (no GPU) - **Epochs:** 15 - **Training Time:** ~41 minutes per epoch Since the training was done on a CPU, it was slower compared to using a GPU. Because of this, we only trained for 15 epochs to save time. DenseNet121 was selected because it passes features directly to deeper layers, which helps improve learning and reduces overfitting โ€” especially useful in medical images like eye scans. https://www.researchgate.net/publication/373171778_Deep_learning-enhanced_diabetic_retinopathy_image_classification """)