🍽️ Swiggy Restaurant Rating Predictor β€” Classification, Regression & Clustering

Reichman University | Adelson School of Entrepreneurship | Introduction to Data Science | 2026


🎬 Video Presentation


πŸ“Œ Project Overview

This project applies a full end-to-end data science pipeline to the Swiggy Restaurants Dataset β€” a dataset of over 107,000 restaurants across India, collected from the Swiggy food delivery platform.

"Can we predict a restaurant's rating based on its features β€” cuisine, price, location, and popularity?"

Research Question Can we predict a restaurant's rating based on its features?
Dataset Swiggy Restaurants Dataset β€” Kaggle
Dataset Size ~107,000 restaurants across India
Target Variable Rating β€” customer rating (1.0–5.0)
Task Types Regression + Classification + Clustering

πŸ“Š Dataset

  • Source: Kaggle β€” Swiggy Restaurants Dataset
  • Size: 107,000 rows Γ— 6 columns
  • Numeric features: Average Price, Number of Ratings, Number of Offers
  • Categorical features: Cuisine, Location, Pure Veg
  • Target (Regression): Rating β€” customer rating (1.0–5.0)
  • Target (Classification): Rating converted to 3 balanced classes β€” Low, Medium, High

🧹 Part 2: Data Cleaning & EDA

Data Cleaning

  • Removed 33,719 rows with missing Rating values β€” our target variable
  • Converted Average Price from text format (e.g. "β‚Ή250 for two") to numeric
  • Converted Number of Ratings from text format (e.g. "10+ ratings") to numeric
  • Encoded categorical columns: Cuisine, Location, Pure Veg
  • Final clean dataset: ~107,000 rows ready for modeling

πŸ” Outlier Detection

Rating Distribution

The Rating distribution is slightly left-skewed, with most values between 3.5 and 4.5. Ratings outside the valid 1.0–5.0 range were treated as invalid and removed during cleaning. No extreme outliers were found in numeric features.

Decision: Keep all valid data β€” no artificial capping was applied.


❓ Question 1: What is the average rating by cuisine type?

Q1 Cuisine Ratings

Answer: All top 10 cuisine types have very similar average ratings, clustered tightly around 4.0. South Indian and Chinese cuisines rank slightly higher, but the differences are minimal β€” confirming that cuisine type alone does not determine a restaurant's success on Swiggy.


❓ Question 2: Does price affect rating?

Q2 Price vs Rating

Answer: There is almost no correlation between price and rating (r = 0.06). Expensive restaurants (β‚Ή1,000+) are rated no higher than budget ones (β‚Ή100–300). Swiggy customers rate based on experience, not price tag.


❓ Question 3: Do vegetarian restaurants get better ratings?

Q3 Veg Ratings

Answer: Pure vegetarian and non-vegetarian restaurants receive nearly identical average ratings (~4.0). Being vegetarian gives no rating advantage on Swiggy β€” quality matters more than dietary category.


❓ Question 4: Which cities have the most restaurants on Swiggy?

Q4 Top Cities

Answer: The top 10 cities each have between 1,400–1,600 restaurants on Swiggy. Kanchipuram and Kanpur lead slightly. Swiggy has broad and balanced geographic coverage across India β€” no single city dominates the platform.


❓ Question 5: Do more popular restaurants get better scores?

Q5 Popularity vs Rating

Answer: Restaurants with more ratings tend to cluster around stable scores of 3.5–4.5, while restaurants with very few ratings show more extreme and unreliable scores. Popularity stabilizes ratings β€” but does not guarantee higher ones. Social proof plays a role in anchoring expectations.


🌍 BONUS: Correlation Heatmap

Correlation Heatmap

Note: The interactive version of this heatmap is available in the notebook.

  • Average Price has very weak correlation with Rating (r = 0.06)
  • Number of Ratings has slightly more influence on Rating (r = 0.07)
  • No single numeric feature strongly predicts the rating alone

This finding motivated the entire feature engineering approach in Part 4.


βš™οΈ Part 3: Baseline Model

Before building complex models, we established a Linear Regression baseline using only raw features β€” no engineering, no transformations. This gives us a clear reference point to measure how much our improvements actually help.

Metric Value
MAE 0.3635
RMSE 0.4916
RΒ² 0.0114

The baseline explains only 1.1% of the variance in ratings β€” a humble but expected result. Most restaurants cluster tightly between 3.5–4.5, leaving little signal for a simple linear model to learn from.

Predicted vs Actual

The model struggles to predict outside the 4.0–4.5 range β€” confirming that raw features alone are not enough. Most predictions cluster around the mean, missing the full range of actual ratings.

Baseline Feature Importance

All feature importance values are extremely low β€” confirming that raw features alone are not enough to predict restaurant ratings. This motivated the entire feature engineering approach in Part 4.

Challenge set: Can we do significantly better with feature engineering and more powerful models?


πŸ”§ Part 4: Feature Engineering

Raw data alone is rarely enough. i have engineered 4 new features designed to capture patterns the original columns couldn't express:

Feature Description Intuition
Is_Expensive Price above β‚Ή300 Captures the premium restaurant segment
Has_Many_Offers 4 or more offers available Captures promotional activity level
Is_Popular More than 50 ratings Captures established vs new restaurants
Cluster K-Means cluster ID (k=4) Groups similar restaurants by behavioral profile

K-Means Clustering (k=4)

Applied K-Means clustering on Average Price, Number of Ratings, and Number of Offers to automatically group restaurants into 4 behavioral profiles β€” unsupervised learning working alongside our supervised models:

  • Cluster 0: Budget restaurants with few ratings
  • Cluster 1: Mid-range restaurants
  • Cluster 2: Popular restaurants with many ratings
  • Cluster 3: High-offer restaurants

Clustering 3D

Clustering PCA


πŸ“ BONUS: Elbow Method

Elbow Method

Used the Elbow Method to scientifically validate K=4 as the optimal number of clusters β€” the inertia drops sharply before K=4 and plateaus after, confirming our choice.


πŸ’‘ Feature Engineering Insights

  • Number of Ratings β€” remained the top feature in both regression and classification
  • Location β€” consistently the second most important feature
  • Cluster β€” contributed meaningful predictive signal, proving clustering added real value
  • Binary features (Is_Expensive, Is_Popular) helped the model distinguish restaurant segments

Business Insight: WHERE a restaurant is located and HOW POPULAR it already is matter far more than what it serves or how expensive it is. Location and reputation drive ratings more than menu or pricing strategy.


πŸ€– Part 5: Regression Models

Trained 3 different regression models on the engineered dataset and compared them against the baseline β€” an iterative improvement process:

Model MAE RΒ²
Linear Regression (Baseline) 0.3635 0.0114
Linear Regression (Improved) 0.3623 0.0160
Random Forest 0.3481 0.0510
Gradient Boosting βœ… 0.3463 0.0776

πŸ† Winner: Gradient Boosting (MAE = 0.3463, RΒ² = 0.0776)

Why Gradient Boosting wins: It builds trees iteratively, each one correcting the mistakes of the previous β€” better at capturing the subtle patterns in tightly-clustered restaurant ratings.

Model Comparison


Feature Importance

Feature Importance

Top predictors:

  • Number of Ratings (#1) β€” more reviewed restaurants are more predictable
  • Location (#2) β€” where a restaurant is matters more than what it serves
  • Cuisine (#3) β€” cuisine type has some influence
  • Cluster (#4) β€” our engineered feature added real predictive value!

BONUS: Residual Analysis

Residual Analysis

  • Residuals vs Predicted: Residuals centered around 0 β€” the model is unbiased
  • Distribution of Residuals: Near-normal distribution β€” the model makes symmetric errors
  • The spread reflects the inherent difficulty of predicting ratings in the narrow 3.5–4.5 range

BONUS: Hyperparameter Tuning

Used GridSearchCV with 3-fold cross validation to find optimal parameters:

  • Best params: learning_rate=0.05, max_depth=5, n_estimators=100
  • Best MAE: 0.3497 β€” confirms our initial parameters were already well-chosen

πŸ’Ύ Part 6: Saved Model

The winning Gradient Boosting Regressor was saved and uploaded to this HuggingFace repository.

Model Gradient Boosting Regressor
File swiggy_model.pkl
MAE 0.3463
RΒ² 0.0776

🏷️ Part 7: Regression β†’ Classification

Converted Rating into 3 meaningful classes using quantile binning:

Class Definition % of Data
0 β€” Low Bottom 33% 33.8%
1 β€” Medium Middle 33% 42.6%
2 β€” High Top 33% 23.7%

Class Distribution

Why quantile binning? Creates balanced classes automatically and divides restaurants into meaningful Low / Medium / High groups.

Why F1 over accuracy? The dataset has mild class imbalance β€” a model predicting "Medium" for everything would get 42% accuracy without learning anything. F1 (weighted) is a fairer metric.

Why Recall matters more here: It's worse to miss a truly great restaurant (False Negative) than to occasionally recommend a mediocre one (False Positive). False Negative is more critical β€” predicting LOW when the restaurant is actually HIGH means hiding good restaurants from users.


🧠 Part 8: Classification Models

Trained 3 different classification models to predict restaurant rating classes (Low / Medium / High):

Model Accuracy F1 (weighted)
Logistic Regression 0.43 0.33
Random Forest 0.45 0.46
Gradient Boosting βœ… 0.50 0.46

πŸ† Winner: Gradient Boosting Classifier β€” best accuracy AND best F1 score.


Confusion Matrices

Confusion Matrices

Key observations:

  • Logistic Regression β€” predicted almost everything as "Medium", struggled with class separation
  • Random Forest β€” more balanced predictions across all 3 classes
  • Gradient Boosting β€” best overall accuracy (50%) with most correct predictions

Feature Importance (Classification)

Feature Importance Classification

  • Number of Ratings (#1) β€” most reviewed restaurants are most predictable
  • Cuisine_encoded (#2) β€” cuisine type plays a strong role in rating class
  • Location_encoded (#3) β€” where a restaurant is matters
  • Cluster (#4) β€” our engineered feature contributed real signal!

🎯 BONUS: Hyperparameter Tuning β€” Before vs After

Tuning Comparison

Applied GridSearchCV with 12 combinations and 3-fold cross validation:

Before Tuning After Tuning
F1 (weighted) 0.46 0.58
Accuracy 0.50 0.50
Best Params β€” n_estimators=200, max_depth=None, min_samples_split=5

+26% improvement in F1 score from tuning alone.


πŸ’‘ Part 8 Summary

Model Accuracy F1 (weighted)
Logistic Regression 0.43 0.33
Random Forest 0.45 0.46
Gradient Boosting 0.50 0.46
Gradient Boosting (Tuned) βœ… 0.50 0.58

This consistency across both regression and classification models confirms that our feature engineering choices were solid and well-justified.


πŸ“¦ Repository Contents

File Description
README.md This file
Copy_of_Assignment_2_...ipynb Full Colab notebook with all code and outputs
swiggy_model.pkl Winning Gradient Boosting Regression model
swiggy_classifier.pkl Winning Gradient Boosting Classification model

πŸ“ Project Summary

This project demonstrates a complete data science pipeline applied to the Swiggy Restaurants Dataset β€” over 107,000 restaurants across India. Starting from raw restaurant data, we built a system that predicts customer ratings and classifies restaurants into Low, Medium, and High rated groups.

The combination of feature engineering, clustering, and hyperparameter tuning resulted in a ~600% improvement over the baseline regression model (RΒ² from 0.0114 β†’ 0.0776), and a 26% improvement in classification F1 (0.46 β†’ 0.58) through tuning alone.

Author: Amit Ben Avraham | Reichman University β€” Adelson School of Entrepreneurship | Introduction to Data Science | 2026


πŸ€– AI Usage Disclosure

This project was completed with assistance from Claude (Anthropic) for code debugging, chart design, and README writing. All analysis, decisions, and interpretations are my own.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support