Spaces:

Ci-Dave
/

DR_Classification

Sleeping

App Files Files Community

DR_Classification / pages /Dataset.py

3v324v23

fixed path to work in huggingface

ddfabeb 3 months ago

raw

history blame contribute delete

4.29 kB

	import streamlit as st
	import pandas as pd
	import os
	from PIL import Image
	import matplotlib.pyplot as plt
	import seaborn as sns
	import numpy as np

	st.set_page_config(layout="wide")
	st.title("🩺 Diabetic Retinopathy Project")

	# Tabs
	tab1, tab2, tab3 = st.tabs(["📂 Dataset Info", "📊 Training Visualization", "🤖 Algorithm Used"])

	# =============================
	# Tab 1: Dataset Information
	# =============================
	with tab1:
	st.markdown("""
	### 🧾 Dataset Overview

	Dataset Description:

	The DDR dataset contains 13,673 fundus images from 147 hospitals across 23 provinces in China. The images are labeled into 5 classes based on DR severity:
	- No_DR
	- Mild
	- Moderate
	- Severe
	- Proliferative_DR

	Poor-quality images were removed, and black backgrounds were deleted. 12,521 images left
	[📎 Dataset source](https://www.kaggle.com/datasets/mariaherrerot/ddrdataset)

	### 🧪 Data Preparation & Splitting

	- All images resized to 224x224
	- 70% Training, 30% Testing (stratified by class)
	""")

	# =============================
	# Tab 2: Training Visualization
	# =============================
	with tab2:
	st.markdown("### 📊 Training Data Class Distribution")

	# CSV path and image folder path (adjust as needed)
	CSV_PATH = "./dataset/DR_grading.csv"
	IMG_FOLDER = "./dataset/images" # Folder where all images are stored

	# Load CSV
	df = pd.read_csv(CSV_PATH)

	# Map the 'diagnosis' column to 'label' if it's numeric (e.g., 0 to 4)
	label_map = {
	0: "No_DR",
	1: "Mild",
	2: "Moderate",
	3: "Severe",
	4: "Proliferative_DR"
	}
	df['label'] = df['diagnosis'].map(label_map)

	# --- Metric 1: Full Dataset Table ---
	st.subheader("3️⃣ Full Dataset Table")
	st.dataframe(df, use_container_width=True)

	# --- Metric 2: Class Distribution ---
	st.subheader("1️⃣ Class Distribution")
	class_counts = df['label'].value_counts().reset_index()
	class_counts.columns = ['Class', 'Count']

	fig1, ax1 = plt.subplots()
	sns.barplot(data=class_counts, x='Class', y='Count', palette='rocket', ax=ax1)
	ax1.set_title("Class Distribution")
	st.pyplot(fig1)

	# --- Metric 3: Sample Images Per Class ---
	st.subheader("2️⃣ Sample Images Per Class")

	cols = st.columns(len(class_counts))
	for i, label in enumerate(class_counts['Class']):
	sample_row = df[df['label'] == label].iloc[0] # Get first image of this class
	img_path = os.path.join(IMG_FOLDER, sample_row['id_code']) # Assuming image filenames are id_code.png
	if os.path.exists(img_path):
	image = Image.open(img_path)
	cols[i].image(image, caption=label, use_container_width=True)
	else:
	cols[i].write(f"Image not found: {sample_row['id_code']}")
	# =============================
	# Tab 3: Algorithm Used
	# =============================
	with tab3:
	st.markdown("""
	### 🤖 Model and Algorithm

	We used Transfer Learning with DenseNet121 for DR classification.

	#### 🏗️ Model Details:
	- Model: DenseNet121 (pretrained on ImageNet)
	- Input Image Size: 224x224
	- Batch Size: 32
	- Optimizer: AdamW (learning rate = 1e-3)
	- Loss Function: Categorical Crossentropy
	- Evaluation Metrics: Accuracy, Precision, Recall

	#### 📊 Evaluation Results:
	- Top-1 Accuracy: 85.0%
	- Top-2 Accuracy: 84.9%
	- Top-3 Accuracy: 84.6%

	#### 🖥️ Training Environment:
	- Operating System: Windows
	- Hardware: CPU only (no GPU)
	- Epochs: 15
	- Training Time: ~41 minutes per epoch

	Since the training was done on a CPU, it was slower compared to using a GPU.
	Because of this, we only trained for 15 epochs to save time.

	DenseNet121 was selected because it passes features directly to deeper layers,
	which helps improve learning and reduces overfitting — especially useful in medical images like eye scans.
	https://www.researchgate.net/publication/373171778_Deep_learning-enhanced_diabetic_retinopathy_image_classification
	""")