Spaces:
Configuration error
Configuration error
COVID-19 Prediction Model
This project implements a COVID-19 prediction system using regression models with a focus on Random Forest and three other regression models. The system includes a Gradio user interface for Hugging Face deployment.
Features
- Memory-optimized data processing that can handle multiple datasets of different types and object types
- Multiple regression models for comparison:
- Random Forest Regression
- Linear Regression
- Support Vector Regression (SVR)
- Gradient Boosting Regression
- Gradio UI for easy model selection, visualization, and deployment to Hugging Face Spaces
- Complete data preprocessing pipeline with feature engineering
- Performance evaluation metrics and visualization
Project Structure
COVID-19-Prediction/
βββ covid_full_dataset.csv # Complete COVID-19 dataset
βββ US_engineered_features.csv # Engineered features for US data
βββ raw_confirmed.csv # Raw confirmed cases data
βββ raw_deaths.csv # Raw deaths data
βββ raw_recovered.csv # Raw recovered cases data
βββ raw_owid.csv # Additional data from Our World in Data
βββ covid_country_ts.csv # Country-level time series data
βββ preprocess_data.py # Data preprocessing script
βββ train_models.py # Model training script
βββ gradio_app.py # Gradio UI for predictions
βββ run_pipeline.py # Complete pipeline runner
βββ requirements.txt # Project dependencies
Installation
Clone this repository:
git clone https://github.com/yourusername/covid19-prediction.git cd covid19-prediction
Install the required packages:
pip install -r requirements.txt
Usage
Run the Complete Pipeline
To run the complete pipeline (preprocessing, training, and UI):
python run_pipeline.py
Pipeline Options
- Skip preprocessing:
python run_pipeline.py --skip-preprocessing
- Skip training:
python run_pipeline.py --skip-training
- Only launch UI:
python run_pipeline.py --only-ui
Run Individual Steps
Data Preprocessing:
python preprocess_data.py
Model Training:
python train_models.py
Launch Gradio UI:
python gradio_app.py
Memory Optimization
This project is optimized to handle large datasets efficiently:
- Uses appropriate data types to minimize memory footprint
- Processes data in chunks for large files
- Employs garbage collection to free memory
- Uses compressed NumPy formats for storing processed data
- Optimizes model parameters for memory efficiency
Models
The project implements and compares four regression models:
- Random Forest Regressor: An ensemble learning method that builds multiple decision trees and merges their predictions.
- Linear Regression: A simple baseline model that assumes a linear relationship between features and target.
- Support Vector Regression (SVR): Uses support vectors to create a regression model that can capture non-linear relationships.
- Gradient Boosting Regressor: An ensemble technique that builds trees sequentially, with each tree correcting errors made by previous ones.
Hugging Face Deployment
The Gradio interface is configured for easy deployment to Hugging Face Spaces:
- Create a new Space on Hugging Face
- Upload all files to the Space
- The app will automatically configure for the Hugging Face environment
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- Data sources: Johns Hopkins CSSE, Our World in Data
- Libraries: Pandas, NumPy, Scikit-learn, Matplotlib, Seaborn, Gradio