3v324v23's picture
fixed path to work in huggingface
ddfabeb
import streamlit as st
import pandas as pd
import os
from PIL import Image
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
st.set_page_config(layout="wide")
st.title("🩺 Diabetic Retinopathy Project")
# Tabs
tab1, tab2, tab3 = st.tabs(["πŸ“‚ Dataset Info", "πŸ“Š Training Visualization", "πŸ€– Algorithm Used"])
# =============================
# Tab 1: Dataset Information
# =============================
with tab1:
st.markdown("""
### 🧾 Dataset Overview
**Dataset Description:**
The DDR dataset contains **13,673 fundus images** from **147 hospitals** across **23 provinces in China**. The images are labeled into 5 classes based on DR severity:
- **No_DR**
- **Mild**
- **Moderate**
- **Severe**
- **Proliferative_DR**
Poor-quality images were removed, and black backgrounds were deleted. **12,521 images left**
[πŸ“Ž Dataset source](https://www.kaggle.com/datasets/mariaherrerot/ddrdataset)
### πŸ§ͺ Data Preparation & Splitting
- All images resized to **224x224**
- **70% Training**, **30% Testing** (stratified by class)
""")
# =============================
# Tab 2: Training Visualization
# =============================
with tab2:
st.markdown("### πŸ“Š Training Data Class Distribution")
# CSV path and image folder path (adjust as needed)
CSV_PATH = "./dataset/DR_grading.csv"
IMG_FOLDER = "./dataset/images" # Folder where all images are stored
# Load CSV
df = pd.read_csv(CSV_PATH)
# Map the 'diagnosis' column to 'label' if it's numeric (e.g., 0 to 4)
label_map = {
0: "No_DR",
1: "Mild",
2: "Moderate",
3: "Severe",
4: "Proliferative_DR"
}
df['label'] = df['diagnosis'].map(label_map)
# --- Metric 1: Full Dataset Table ---
st.subheader("3️⃣ Full Dataset Table")
st.dataframe(df, use_container_width=True)
# --- Metric 2: Class Distribution ---
st.subheader("1️⃣ Class Distribution")
class_counts = df['label'].value_counts().reset_index()
class_counts.columns = ['Class', 'Count']
fig1, ax1 = plt.subplots()
sns.barplot(data=class_counts, x='Class', y='Count', palette='rocket', ax=ax1)
ax1.set_title("Class Distribution")
st.pyplot(fig1)
# --- Metric 3: Sample Images Per Class ---
st.subheader("2️⃣ Sample Images Per Class")
cols = st.columns(len(class_counts))
for i, label in enumerate(class_counts['Class']):
sample_row = df[df['label'] == label].iloc[0] # Get first image of this class
img_path = os.path.join(IMG_FOLDER, sample_row['id_code']) # Assuming image filenames are id_code.png
if os.path.exists(img_path):
image = Image.open(img_path)
cols[i].image(image, caption=label, use_container_width=True)
else:
cols[i].write(f"Image not found: {sample_row['id_code']}")
# =============================
# Tab 3: Algorithm Used
# =============================
with tab3:
st.markdown("""
### πŸ€– Model and Algorithm
We used **Transfer Learning** with **DenseNet121** for DR classification.
#### πŸ—οΈ Model Details:
- Model: **DenseNet121** (pretrained on **ImageNet**)
- Input Image Size: **224x224**
- Batch Size: **32**
- Optimizer: **AdamW** (learning rate = **1e-3**)
- Loss Function: **Categorical Crossentropy**
- Evaluation Metrics: **Accuracy**, **Precision**, **Recall**
#### πŸ“Š Evaluation Results:
- **Top-1 Accuracy:** 85.0%
- **Top-2 Accuracy:** 84.9%
- **Top-3 Accuracy:** 84.6%
#### πŸ–₯️ Training Environment:
- **Operating System:** Windows
- **Hardware:** CPU only (no GPU)
- **Epochs:** 15
- **Training Time:** ~41 minutes per epoch
Since the training was done on a CPU, it was slower compared to using a GPU.
Because of this, we only trained for 15 epochs to save time.
DenseNet121 was selected because it passes features directly to deeper layers,
which helps improve learning and reduces overfitting β€” especially useful in medical images like eye scans.
https://www.researchgate.net/publication/373171778_Deep_learning-enhanced_diabetic_retinopathy_image_classification
""")