Spaces:

vishwak1
/

disease_prediction

Configuration error

App Files Files Community

disease_prediction / README.md

vishwak1

Upload 14 files

fb61aba verified 5 months ago

preview code

raw

history blame contribute delete

4.05 kB

COVID-19 Prediction Model

This project implements a COVID-19 prediction system using regression models with a focus on Random Forest and three other regression models. The system includes a Gradio user interface for Hugging Face deployment.

Features

Memory-optimized data processing that can handle multiple datasets of different types and object types
Multiple regression models for comparison:
- Random Forest Regression
- Linear Regression
- Support Vector Regression (SVR)
- Gradient Boosting Regression
Gradio UI for easy model selection, visualization, and deployment to Hugging Face Spaces
Complete data preprocessing pipeline with feature engineering
Performance evaluation metrics and visualization

Project Structure

COVID-19-Prediction/
├── covid_full_dataset.csv       # Complete COVID-19 dataset
├── US_engineered_features.csv   # Engineered features for US data
├── raw_confirmed.csv            # Raw confirmed cases data
├── raw_deaths.csv               # Raw deaths data
├── raw_recovered.csv            # Raw recovered cases data
├── raw_owid.csv                 # Additional data from Our World in Data
├── covid_country_ts.csv         # Country-level time series data
├── preprocess_data.py           # Data preprocessing script
├── train_models.py              # Model training script
├── gradio_app.py                # Gradio UI for predictions
├── run_pipeline.py              # Complete pipeline runner
└── requirements.txt             # Project dependencies

Installation

Clone this repository:

git clone https://github.com/yourusername/covid19-prediction.git
cd covid19-prediction

Install the required packages:
```
pip install -r requirements.txt
```

Usage

Run the Complete Pipeline

To run the complete pipeline (preprocessing, training, and UI):

python run_pipeline.py

Pipeline Options

Skip preprocessing: python run_pipeline.py --skip-preprocessing
Skip training: python run_pipeline.py --skip-training
Only launch UI: python run_pipeline.py --only-ui

Run Individual Steps

Data Preprocessing:
```
python preprocess_data.py
```
Model Training:
```
python train_models.py
```
Launch Gradio UI:
```
python gradio_app.py
```

Memory Optimization

This project is optimized to handle large datasets efficiently:

Uses appropriate data types to minimize memory footprint
Processes data in chunks for large files
Employs garbage collection to free memory
Uses compressed NumPy formats for storing processed data
Optimizes model parameters for memory efficiency

Models

The project implements and compares four regression models:

Random Forest Regressor: An ensemble learning method that builds multiple decision trees and merges their predictions.
Linear Regression: A simple baseline model that assumes a linear relationship between features and target.
Support Vector Regression (SVR): Uses support vectors to create a regression model that can capture non-linear relationships.
Gradient Boosting Regressor: An ensemble technique that builds trees sequentially, with each tree correcting errors made by previous ones.

Hugging Face Deployment

The Gradio interface is configured for easy deployment to Hugging Face Spaces:

Create a new Space on Hugging Face
Upload all files to the Space
The app will automatically configure for the Hugging Face environment

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Data sources: Johns Hopkins CSSE, Our World in Data
Libraries: Pandas, NumPy, Scikit-learn, Matplotlib, Seaborn, Gradio