metadata
title: UnsupervisedCustumerPrediction
emoji: 🧩
colorFrom: indigo
colorTo: blue
sdk: docker
app_port: 8501
tags:
- streamlit
pinned: false
short_description: Streamlit app that predicts cluster labels from uploaded CSV
license: mit
🧩 Clustering Predictor (KMeans / GMM)
This Space predicts cluster labels for uploaded tabular data using a saved preprocessing pipeline:
- StandardScaler
- PCA (95% explained variance)
- A clustering model (KMeans or Gaussian Mixture Model)
✅ What this app does
- Upload a CSV file
- The app checks required feature columns
- Applies scaler + PCA
- Outputs Predicted cluster label for each row
- Lets you download the predictions as a CSV
📦 Required files (must be in the repo root)
Place these files next to app.py:
feature_names.pklscaler.pklpca.pklkmeans_model_k9.pkl(optional, if you want KMeans)gmm_model_k9.pkl(optional, if you want GMM)
🧾 Input format
Your CSV must include all feature columns stored in feature_names.pkl.
Optional:
- You may include an
idorIdcolumn.
If present, it will be included in the output asId.
▶️ Run locally
pip install -r requirements.txt
streamlit run app.py
📝 Notes
This is an unsupervised project, so cluster quality is evaluated on Kaggle using the leaderboard score.
Visual separation in 2D does not always reflect the Kaggle metric.