Instructions to use BarWachsman7/austin-housing-regression with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Scikit-learn
How to use BarWachsman7/austin-housing-regression with Scikit-learn:
from huggingface_hub import hf_hub_download import joblib model = joblib.load( hf_hub_download("BarWachsman7/austin-housing-regression", "sklearn_model.joblib") ) # only load pickle files from sources you trust # read more about it here https://skops.readthedocs.io/en/stable/persistence.html - Notebooks
- Google Colab
- Kaggle
Austin Housing Price Prediction Models
This repository contains two trained machine learning models created for a first-year Data Science assignment.
The project focuses on predicting housing prices in Austin, Texas using machine learning techniques, including regression, clustering-based feature engineering, and classification.
Files in this Repository
regression_model.pkl— trained regression model for predicting housing prices.classification_model.pkl— trained classification model for classifying houses into price categories.
Dataset
The models were trained on the Austin housing dataset.
Each row in the dataset represents a property listing.
The dataset includes features such as:
- Living area
- Number of bedrooms
- Number of bathrooms
- Property type
- School rating
- Geographic location
- Property tax rate
- Sale date
- Latest property price
The main target variable for the regression task is latestPrice.
Regression Task
The regression task predicts the estimated price of a property.
Several regression models were tested:
- Baseline Linear Regression
- Ridge Regression
- Random Forest Regressor
- Gradient Boosting Regressor
The selected regression model is:
Gradient Boosting Regressor
It was selected because it achieved the lowest Mean Absolute Error (MAE), which is the most practical metric for a housing price prediction task.
Approximate regression results:
| Model | R² | MAE |
|---|---|---|
| Baseline Linear Regression | 0.2347 | $163,697.27 |
| Ridge Regression | 0.3573 | $135,460.67 |
| Random Forest Regressor | 0.2441 | $119,029.93 |
| Gradient Boosting Regressor | 0.2603 | $117,222.75 |
Although Ridge Regression achieved the highest R² score, Gradient Boosting was selected as the preferred operational model because it achieved the lowest average dollar-level prediction error.
Clustering
K-Means clustering was used as part of the feature engineering process.
An exploratory clustering step with k=6 was used for geographic visualization.
The final clustering value used in the model pipeline was k=4, based on the elbow analysis.
Cluster-based features were added to help the models capture geographic and structural market patterns.
Classification Task
The regression problem was also converted into a classification problem.
Instead of predicting the exact price, the properties were divided into three price categories:
- Affordable
- Mid-Range
- Luxury
The classification models tested were:
- Logistic Regression
- Random Forest Classifier
- Gradient Boosting Classifier
The selected classification model is:
Gradient Boosting Classifier
Approximate classification results:
| Model | Macro F1 | Macro ROC-AUC |
|---|---|---|
| Logistic Regression | ~0.72 | ~0.90 |
| Random Forest Classifier | ~0.79 | ~0.93 |
| Gradient Boosting Classifier | ~0.80 | ~0.936 |
Gradient Boosting Classifier was selected because it achieved the strongest overall results across the reported classification metrics.
Intended Use
These models were created for an academic Data Science assignment.
They are intended to demonstrate:
- Data cleaning
- Exploratory Data Analysis
- Feature engineering
- Clustering
- Regression modeling
- Classification modeling
- Model evaluation
- Saving trained models with Pickle
Limitations
These models should not be used for real financial or real estate decisions.
The predictions are based on a specific dataset and may not generalize to other cities, time periods, or real estate markets.
The models are intended for educational purposes only.
Author
Created by Bar Wachsman as part of a Data Science assignment.
- Downloads last month
- -