- π½οΈ Swiggy Restaurant Rating Predictor β Classification, Regression & Clustering
- π¬ Video Presentation
- π Project Overview
- π§Ή Part 2: Data Cleaning & EDA
- Data Cleaning
- π Outlier Detection
- β Question 1: What is the average rating by cuisine type?
- β Question 2: Does price affect rating?
- β Question 3: Do vegetarian restaurants get better ratings?
- β Question 4: Which cities have the most restaurants on Swiggy?
- β Question 5: Do more popular restaurants get better scores?
- π BONUS: Correlation Heatmap
- βοΈ Part 3: Baseline Model
- π§ Part 4: Feature Engineering
- π€ Part 5: Regression Models
- πΎ Part 6: Saved Model
- π·οΈ Part 7: Regression β Classification
- π§ Part 8: Classification Models
- π¦ Repository Contents
- π Project Summary
- π€ AI Usage Disclosure
- π¬ Video Presentation
π½οΈ Swiggy Restaurant Rating Predictor β Classification, Regression & Clustering
Reichman University | Adelson School of Entrepreneurship | Introduction to Data Science | 2026
π¬ Video Presentation
π Project Overview
This project applies a full end-to-end data science pipeline to the Swiggy Restaurants Dataset β a dataset of over 107,000 restaurants across India, collected from the Swiggy food delivery platform.
"Can we predict a restaurant's rating based on its features β cuisine, price, location, and popularity?"
| Research Question | Can we predict a restaurant's rating based on its features? |
| Dataset | Swiggy Restaurants Dataset β Kaggle |
| Dataset Size | ~107,000 restaurants across India |
| Target Variable | Rating β customer rating (1.0β5.0) |
| Task Types | Regression + Classification + Clustering |
π Dataset
- Source: Kaggle β Swiggy Restaurants Dataset
- Size: 107,000 rows Γ 6 columns
- Numeric features: Average Price, Number of Ratings, Number of Offers
- Categorical features: Cuisine, Location, Pure Veg
- Target (Regression):
Ratingβ customer rating (1.0β5.0) - Target (Classification):
Ratingconverted to 3 balanced classes β Low, Medium, High
π§Ή Part 2: Data Cleaning & EDA
Data Cleaning
- Removed 33,719 rows with missing
Ratingvalues β our target variable - Converted
Average Pricefrom text format (e.g. "βΉ250 for two") to numeric - Converted
Number of Ratingsfrom text format (e.g. "10+ ratings") to numeric - Encoded categorical columns:
Cuisine,Location,Pure Veg - Final clean dataset: ~107,000 rows ready for modeling
π Outlier Detection
The Rating distribution is slightly left-skewed, with most values
between 3.5 and 4.5. Ratings outside the valid 1.0β5.0 range were
treated as invalid and removed during cleaning. No extreme outliers
were found in numeric features.
Decision: Keep all valid data β no artificial capping was applied.
β Question 1: What is the average rating by cuisine type?
Answer: All top 10 cuisine types have very similar average ratings, clustered tightly around 4.0. South Indian and Chinese cuisines rank slightly higher, but the differences are minimal β confirming that cuisine type alone does not determine a restaurant's success on Swiggy.
β Question 2: Does price affect rating?
Answer: There is almost no correlation between price and rating (r = 0.06). Expensive restaurants (βΉ1,000+) are rated no higher than budget ones (βΉ100β300). Swiggy customers rate based on experience, not price tag.
β Question 3: Do vegetarian restaurants get better ratings?
Answer: Pure vegetarian and non-vegetarian restaurants receive nearly identical average ratings (~4.0). Being vegetarian gives no rating advantage on Swiggy β quality matters more than dietary category.
β Question 4: Which cities have the most restaurants on Swiggy?
Answer: The top 10 cities each have between 1,400β1,600 restaurants on Swiggy. Kanchipuram and Kanpur lead slightly. Swiggy has broad and balanced geographic coverage across India β no single city dominates the platform.
β Question 5: Do more popular restaurants get better scores?
Answer: Restaurants with more ratings tend to cluster around stable scores of 3.5β4.5, while restaurants with very few ratings show more extreme and unreliable scores. Popularity stabilizes ratings β but does not guarantee higher ones. Social proof plays a role in anchoring expectations.
π BONUS: Correlation Heatmap
Note: The interactive version of this heatmap is available in the notebook.
Average Pricehas very weak correlation withRating(r = 0.06)Number of Ratingshas slightly more influence onRating(r = 0.07)- No single numeric feature strongly predicts the rating alone
This finding motivated the entire feature engineering approach in Part 4.
βοΈ Part 3: Baseline Model
Before building complex models, we established a Linear Regression baseline using only raw features β no engineering, no transformations. This gives us a clear reference point to measure how much our improvements actually help.
| Metric | Value |
|---|---|
| MAE | 0.3635 |
| RMSE | 0.4916 |
| RΒ² | 0.0114 |
The baseline explains only 1.1% of the variance in ratings β a humble but expected result. Most restaurants cluster tightly between 3.5β4.5, leaving little signal for a simple linear model to learn from.
The model struggles to predict outside the 4.0β4.5 range β confirming that raw features alone are not enough. Most predictions cluster around the mean, missing the full range of actual ratings.
All feature importance values are extremely low β confirming that raw features alone are not enough to predict restaurant ratings. This motivated the entire feature engineering approach in Part 4.
Challenge set: Can we do significantly better with feature engineering and more powerful models?
π§ Part 4: Feature Engineering
Raw data alone is rarely enough. i have engineered 4 new features designed to capture patterns the original columns couldn't express:
| Feature | Description | Intuition |
|---|---|---|
Is_Expensive |
Price above βΉ300 | Captures the premium restaurant segment |
Has_Many_Offers |
4 or more offers available | Captures promotional activity level |
Is_Popular |
More than 50 ratings | Captures established vs new restaurants |
Cluster |
K-Means cluster ID (k=4) | Groups similar restaurants by behavioral profile |
K-Means Clustering (k=4)
Applied K-Means clustering on Average Price, Number of Ratings, and Number of Offers to automatically group restaurants into 4 behavioral profiles β unsupervised learning working alongside our supervised models:
- Cluster 0: Budget restaurants with few ratings
- Cluster 1: Mid-range restaurants
- Cluster 2: Popular restaurants with many ratings
- Cluster 3: High-offer restaurants
π BONUS: Elbow Method
Used the Elbow Method to scientifically validate K=4 as the optimal number of clusters β the inertia drops sharply before K=4 and plateaus after, confirming our choice.
π‘ Feature Engineering Insights
Number of Ratingsβ remained the top feature in both regression and classificationLocationβ consistently the second most important featureClusterβ contributed meaningful predictive signal, proving clustering added real value- Binary features (
Is_Expensive,Is_Popular) helped the model distinguish restaurant segments
Business Insight: WHERE a restaurant is located and HOW POPULAR it already is matter far more than what it serves or how expensive it is. Location and reputation drive ratings more than menu or pricing strategy.
π€ Part 5: Regression Models
Trained 3 different regression models on the engineered dataset and compared them against the baseline β an iterative improvement process:
| Model | MAE | RΒ² |
|---|---|---|
| Linear Regression (Baseline) | 0.3635 | 0.0114 |
| Linear Regression (Improved) | 0.3623 | 0.0160 |
| Random Forest | 0.3481 | 0.0510 |
| Gradient Boosting β | 0.3463 | 0.0776 |
π Winner: Gradient Boosting (MAE = 0.3463, RΒ² = 0.0776)
Why Gradient Boosting wins: It builds trees iteratively, each one correcting the mistakes of the previous β better at capturing the subtle patterns in tightly-clustered restaurant ratings.
Feature Importance
Top predictors:
Number of Ratings(#1) β more reviewed restaurants are more predictableLocation(#2) β where a restaurant is matters more than what it servesCuisine(#3) β cuisine type has some influenceCluster(#4) β our engineered feature added real predictive value!
BONUS: Residual Analysis
- Residuals vs Predicted: Residuals centered around 0 β the model is unbiased
- Distribution of Residuals: Near-normal distribution β the model makes symmetric errors
- The spread reflects the inherent difficulty of predicting ratings in the narrow 3.5β4.5 range
BONUS: Hyperparameter Tuning
Used GridSearchCV with 3-fold cross validation to find optimal parameters:
- Best params:
learning_rate=0.05,max_depth=5,n_estimators=100 - Best MAE: 0.3497 β confirms our initial parameters were already well-chosen
πΎ Part 6: Saved Model
The winning Gradient Boosting Regressor was saved and uploaded to this HuggingFace repository.
| Model | Gradient Boosting Regressor |
| File | swiggy_model.pkl |
| MAE | 0.3463 |
| RΒ² | 0.0776 |
π·οΈ Part 7: Regression β Classification
Converted Rating into 3 meaningful classes using quantile binning:
| Class | Definition | % of Data |
|---|---|---|
| 0 β Low | Bottom 33% | 33.8% |
| 1 β Medium | Middle 33% | 42.6% |
| 2 β High | Top 33% | 23.7% |
Why quantile binning? Creates balanced classes automatically and divides restaurants into meaningful Low / Medium / High groups.
Why F1 over accuracy? The dataset has mild class imbalance β a model predicting "Medium" for everything would get 42% accuracy without learning anything. F1 (weighted) is a fairer metric.
Why Recall matters more here: It's worse to miss a truly great restaurant (False Negative) than to occasionally recommend a mediocre one (False Positive). False Negative is more critical β predicting LOW when the restaurant is actually HIGH means hiding good restaurants from users.
π§ Part 8: Classification Models
Trained 3 different classification models to predict restaurant rating classes (Low / Medium / High):
| Model | Accuracy | F1 (weighted) |
|---|---|---|
| Logistic Regression | 0.43 | 0.33 |
| Random Forest | 0.45 | 0.46 |
| Gradient Boosting β | 0.50 | 0.46 |
π Winner: Gradient Boosting Classifier β best accuracy AND best F1 score.
Confusion Matrices
Key observations:
- Logistic Regression β predicted almost everything as "Medium", struggled with class separation
- Random Forest β more balanced predictions across all 3 classes
- Gradient Boosting β best overall accuracy (50%) with most correct predictions
Feature Importance (Classification)
Number of Ratings(#1) β most reviewed restaurants are most predictableCuisine_encoded(#2) β cuisine type plays a strong role in rating classLocation_encoded(#3) β where a restaurant is mattersCluster(#4) β our engineered feature contributed real signal!
π― BONUS: Hyperparameter Tuning β Before vs After
Applied GridSearchCV with 12 combinations and 3-fold cross validation:
| Before Tuning | After Tuning | |
|---|---|---|
| F1 (weighted) | 0.46 | 0.58 |
| Accuracy | 0.50 | 0.50 |
| Best Params | β | n_estimators=200, max_depth=None, min_samples_split=5 |
+26% improvement in F1 score from tuning alone.
π‘ Part 8 Summary
| Model | Accuracy | F1 (weighted) |
|---|---|---|
| Logistic Regression | 0.43 | 0.33 |
| Random Forest | 0.45 | 0.46 |
| Gradient Boosting | 0.50 | 0.46 |
| Gradient Boosting (Tuned) β | 0.50 | 0.58 |
This consistency across both regression and classification models confirms that our feature engineering choices were solid and well-justified.
π¦ Repository Contents
| File | Description |
|---|---|
README.md |
This file |
Copy_of_Assignment_2_...ipynb |
Full Colab notebook with all code and outputs |
swiggy_model.pkl |
Winning Gradient Boosting Regression model |
swiggy_classifier.pkl |
Winning Gradient Boosting Classification model |
π Project Summary
This project demonstrates a complete data science pipeline applied to the Swiggy Restaurants Dataset β over 107,000 restaurants across India. Starting from raw restaurant data, we built a system that predicts customer ratings and classifies restaurants into Low, Medium, and High rated groups.
The combination of feature engineering, clustering, and hyperparameter tuning resulted in a ~600% improvement over the baseline regression model (RΒ² from 0.0114 β 0.0776), and a 26% improvement in classification F1 (0.46 β 0.58) through tuning alone.
Author: Amit Ben Avraham | Reichman University β Adelson School of Entrepreneurship | Introduction to Data Science | 2026
π€ AI Usage Disclosure
This project was completed with assistance from Claude (Anthropic) for code debugging, chart design, and README writing. All analysis, decisions, and interpretations are my own.


















