Spaces:

surahj
/

electricity-consumption-predictor

Sleeping

App Files Files Community

electricity-consumption-predictor / README.md

surahj

Update readme

9ccab89 3 months ago

preview code

raw

history blame

9.77 kB

Daily Household Electricity Consumption Predictor

A web-based application designed to help Nigerian households estimate their daily electricity usage in Kilowatt-hours (kWh). This project serves as a practical learning vehicle for Machine Learning Operations (MLOps), covering the full lifecycle from data preparation and model training to deployment, monitoring, and continuous improvement.

🎯 Project Goals

Business Goals

Empower Households: Provide users with a simple, accessible tool to understand and predict their daily electricity consumption
Promote Energy Awareness: Help users identify factors influencing their electricity usage, encouraging more efficient energy habits
Inform Budgeting: Enable users to better estimate their electricity bills, reducing financial surprises
Foundational MLOps Learning: Serve as a concrete project to apply and understand core MLOps principles

Machine Learning & Technical Goals

Accurate Prediction: Develop a regression model capable of predicting daily kWh consumption with acceptable accuracy
User-Friendly Interface: Create an intuitive web interface that allows easy input of features and clear display of predictions
Deployable Application: Build a self-contained application that can be deployed to a public platform
MLOps Readiness: Design the application with modularity and best practices that facilitate future MLOps implementation

🏗️ Project Structure

lin-re-model/
├── src/
│   ├── __init__.py
│   ├── data_generator.py      # Synthetic data generation
│   ├── model.py              # ML model training and prediction
│   └── app.py                # Gradio web interface
├── tests/
│   ├── __init__.py
│   ├── test_data_generator.py # Data generator tests
│   ├── test_model.py         # Model tests
│   ├── test_app.py           # Application tests
│   └── test_integration.py   # Integration tests
├── requirements.txt          # Python dependencies
├── pytest.ini              # Pytest configuration
├── run_tests.py            # Test runner script
└── README.md               # This file

🚀 Quick Start

Prerequisites

Python 3.8 or higher
pip (Python package installer)

Installation

Clone the repository (if not already done):

git clone <repository-url>
cd lin-re-model

Install dependencies:
```
pip install -r requirements.txt
```
Run the application:
```
python src/app.py
```
Open your browser and navigate to http://localhost:7860

🧪 Testing

This project includes comprehensive tests to ensure code quality and functionality. The test suite covers:

Unit Tests: Individual component testing
Integration Tests: End-to-end workflow testing
Data Quality Tests: Validation of synthetic data generation
Model Performance Tests: Verification of model accuracy and consistency

Running Tests

Option 1: Using the test runner script

# Run all tests with coverage
python run_tests.py

# Run only unit tests
python run_tests.py unit

# Run only integration tests
python run_tests.py integration

Option 2: Using pytest directly

# Run all tests
pytest

# Run with verbose output
pytest -v

# Run with coverage report
pytest --cov=src --cov-report=html

# Run specific test file
pytest tests/test_model.py

# Run specific test class
pytest tests/test_model.py::TestElectricityConsumptionModel

# Run specific test method
pytest tests/test_model.py::TestElectricityConsumptionModel::test_train_model

Test Coverage

The test suite provides comprehensive coverage including:

Data Generator Tests:
- Data generation with different parameters
- Data splitting functionality
- Data persistence (save/load)
- Data quality validation
- Reproducibility checks
Model Tests:
- Model initialization and training
- Feature preparation and validation
- Prediction functionality
- Model evaluation metrics
- Model persistence (save/load)
- Error handling
Application Tests:
- Web interface functionality
- User interaction flows
- Error handling in UI
- State management
Integration Tests:
- Complete workflow testing
- End-to-end functionality
- Performance consistency
- Data quality across components

Expected Test Results

When all tests pass, you should see output similar to:

🧪 Running Daily Household Electricity Consumption Predictor Tests
======================================================================
============================= test session starts ==============================
platform linux -- Python 3.8.x, pytest-7.4.0, pluggy-1.0.0
rootdir: /path/to/lin-re-model
plugins: cov-4.1.0
collected 45 tests

tests/test_app.py ...................                              [ 42%]
tests/test_data_generator.py ...................                  [ 78%]
tests/test_integration.py ..........                              [100%]

---------- coverage: platform linux, python 3.8.x-final-0 -----------
Name                           Stmts   Miss  Cover   Missing
------------------------------------------------------------
src/__init__.py                    1      0   100%
src/app.py                       180      5    97%   180-185
src/data_generator.py             95      2    98%   95-97
src/model.py                     180      8    96%   180-188
------------------------------------------------------------
TOTAL                           456     15    97%

============================== 45 passed in 5.23s ==============================

✅ All tests passed!

📊 Model Features

The electricity consumption prediction model uses the following features:

Average Daily Temperature (°C): Numerical input (15-35°C range)
Day of the Week: Categorical input (Monday through Sunday)
Major Event: Boolean input (Holiday, Power Outage, etc.)

Model Algorithm

Algorithm: Linear Regression
Preprocessing: StandardScaler for numerical features, OneHotEncoder for categorical features
Evaluation Metrics: MSE, RMSE, MAE, R²

🎮 Using the Application

Step 1: Generate Data & Train Model

Navigate to the "Data Generation & Training" tab
Adjust parameters as desired:
- Number of Data Points (100-5000)
- Noise Level (0.01-0.5)
- Training/Validation/Test Set Proportions
Click "Generate Data & Train Model"
Review the training metrics and evaluation results

Step 2: Make Predictions

Navigate to the "Prediction" tab
Enter your parameters:
- Average Daily Temperature (15-35°C)
- Day of the Week
- Major Event (checkbox)
Click "Predict Consumption"
View your estimated daily electricity consumption

Step 3: Understand the Model

Navigate to the "Model Information" tab
Click "Show Model Information"
Review feature coefficients and model interpretation

🔧 Development

Adding New Tests

To add new tests:

Unit Tests: Add to appropriate test file in tests/
Integration Tests: Add to tests/test_integration.py
Follow naming convention: test_<functionality>
Use descriptive docstrings: Explain what the test validates

Test Best Practices

Isolation: Each test should be independent
Descriptive names: Test names should clearly indicate what they test
Assertions: Use specific assertions with meaningful messages
Coverage: Aim for high test coverage (>95%)
Performance: Tests should run quickly (<10 seconds total)

Running Tests in Development

During development, you can run tests in different ways:

# Quick test run (no coverage)
pytest -x  # Stop on first failure

# Run tests in parallel (if pytest-xdist installed)
pytest -n auto

# Run tests with detailed output
pytest -v -s

# Run tests and watch for changes
pytest-watch  # Requires pytest-watch package

🚀 Deployment

Local Deployment

python src/app.py

Hugging Face Spaces Deployment

Create a new Space on Hugging Face
Upload the project files
Configure the Space to run python src/app.py
The application will be available at your Space URL

📈 Future Enhancements

MLOps Features (Future Phases)

Data Versioning: Implement DVC for data version control
Experiment Tracking: Integrate MLflow or Weights & Biases
Model Registry: Use MLflow Model Registry for model lifecycle management
Containerization: Create Dockerfile for reproducible environments
CI/CD: Set up GitHub Actions for automated testing and deployment
Model Monitoring: Implement monitoring for data drift and performance degradation
Continuous Training: Define triggers for automated retraining

Model Improvements

Feature Engineering: Add more complex features (historical averages, time of day, etc.)
Advanced Models: Experiment with Random Forest, Gradient Boosting, etc.
Hyperparameter Tuning: Implement automated hyperparameter optimization
Ensemble Methods: Combine multiple models for better predictions

🤝 Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests for new functionality
Ensure all tests pass
Submit a pull request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Gradio team for the excellent web interface framework
Scikit-learn team for the machine learning library
The MLOps community for best practices and guidance