UK EPC Rating Predictor

A LightGBM gradient-boosted tree model that predicts residential Energy Performance Certificate (EPC) ratings for properties in England and Wales.

Given property characteristics a homeowner already knows (wall type, heating system, floor area, age band, etc.), the model predicts:

  • A numeric SAP 2012 efficiency score (1–100)
  • A letter grade (A–G)

Model details

Detail Value
Algorithm LightGBM (gradient-boosted trees)
Objective MAE regression (regression_l1)
Trees 5,000
Leaves per tree up to 857
Features 40
Training rows 19,279,916
Test rows 4,045,192
MAE (test set) 3.09 SAP points
Exact grade accuracy 77.4% (calibrated)
Within-1-band accuracy 98.7%

Training data

Trained on the public EPC register maintained by MHCLG, covering all domestic EPC assessments lodged in England and Wales from 2012 to 2023.

Features

The model uses 40 features across four categories:

  • Component efficiency ratings (9): walls, roof, floor, windows, main heating, heating controls, hot water, lighting, secondary heating
  • Binary flags (5): mains gas, solar water heating, solar PV, low energy lighting, flat top storey
  • Numeric (10): floor area, room counts, floor level, storey count, glazing proportion, age band, etc.
  • Categorical (16): property type, built form, fuel type, heating system description, wall/roof/floor descriptions, etc.

Assessor-only fields not available to homeowners (transaction type, floor height, lighting outlet counts) are excluded from the feature set.

Usage

import lightgbm as lgb
import json
import numpy as np

# Load model
booster = lgb.Booster(model_file="lgbm_epc.txt")
meta = json.loads(open("feature_meta.json").read())

# See the full inference pipeline at:
# https://github.com/kulbinderdio/uk-epc-model/blob/main/src/model/predict.py

The full inference pipeline (including categorical encoding and grade threshold calibration) is in src/model/predict.py in the GitHub repository.

Grade calibration

After training, grade boundaries are optimised using Nelder-Mead minimisation on the first 100K test rows. Calibrated boundaries (vs SAP 2012 standard):

Boundary SAP standard Calibrated
G/F 21.0 22.3
F/E 39.0 38.3
E/D 55.0 53.7
D/C 69.0 68.0
C/B 81.0 80.1
B/A 92.0 91.1

Calibrated thresholds are stored in feature_meta.json.

Accuracy by property type

Type MAE Exact grade accuracy
House 2.99 75.3%
Flat 3.04 74.6%
Maisonette 3.13 75.3%
Park home 3.84 76.8%
Bungalow 3.92 70.0%

Top features (by gain)

  1. walls_description — wall construction and insulation type
  2. construction_age_band — decade the property was built
  3. floor_description — floor construction and insulation
  4. total_floor_area — property size in m²
  5. roof_description — roof type and insulation level

Limitations

  • Predictions are estimates only — not a substitute for an official EPC from an accredited assessor
  • Higher uncertainty near grade boundaries (±3 SAP points)
  • Bungalows have lower accuracy (70%) due to higher variance in insulation setups
  • Model trained on assessor-submitted data; self-reported inputs add a further layer of uncertainty

Repository

Full source code, training pipeline, API, and web frontend:
https://github.com/kulbinderdio/uk-epc-model

License

MIT

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results