Spaces:

pranayreddy316
/

Zero_to_Hero_in_Machine_Learning

Build error

App Files Files Community

pranayreddy316 commited on Apr 7

Commit

d00f605

verified ·

1 Parent(s): 5905397

Upload The KNN_Algorithm.py

Browse files

Files changed (1) hide show

pages/The KNN_Algorithm.py +152 -0

pages/The KNN_Algorithm.py ADDED Viewed

	@@ -0,0 +1,152 @@

+import streamlit as st
+st.set_page_config(page_title="KNN: Classification & Regression")
+st.title("📘 Understanding K-Nearest Neighbors (KNN)")
+# Button-like radio selector
+view = st.radio("🧪 Select Topic Mode", ["📌 About KNN Classification", "📌 About KNN Regression"], horizontal=True)
+if view == "📌 About KNN Classification":
+    st.header("🧠 KNN Classification")
+    st.markdown("""
+    KNN Classification is a **supervised learning algorithm** used to classify data points based on the majority class of their nearest neighbors.
+    ### 🔍 How KNN Classification Works:
+    1. Select a value for **K** (number of nearest neighbors).
+    2. Calculate the **distance** between the query point and training points (using a chosen metric).
+    3. Identify **K nearest neighbors**.
+    4. Assign the **most frequent class** among those neighbors to the query point.
+    ### ✅ Example Use Cases:
+    - Spam vs. non-spam classification
+    - Image recognition (e.g. handwritten digits)
+    - Classifying types of flowers (e.g., Iris dataset)
+    ### 🔧 Important Parameters in KNN Classifier:
+    - **`n_neighbors`**: Number of neighbors to use. Small K may lead to overfitting, large K may underfit.
+    - **`weights`**: Determines how neighbors contribute to the prediction:
+      - `'uniform'`: All neighbors have equal influence.
+      - `'distance'`: Closer neighbors contribute more.
+    - **`algorithm`**: Strategy to compute nearest neighbors:
+      - `'auto'`, `'ball_tree'`, `'kd_tree'`, `'brute'`
+    - **`leaf_size`**: Affects tree-based search speed.
+    - **`p`**: Power parameter for Minkowski distance:
+      - `p=1`: Manhattan
+      - `p=2`: Euclidean (default)
+    - **`metric`**: Type of distance metric, e.g. `'minkowski'`, `'cosine'`, `'hamming'`
+    - **`n_jobs`**: Number of CPU cores to use. `-1` means use all available cores.
+    ### 📊 Pros:
+    - Simple and interpretable
+    - No training phase
+    - Good for small datasets
+    ### ⚠️ Cons:
+    - Slow for large datasets
+    - Requires scaling
+    - Sensitive to irrelevant features and noise
+    ### 🔎 Hyperparameter Tuning with Optuna (Concept Only):
+    ```python
+    import optuna
+    from sklearn.neighbors import KNeighborsClassifier
+    from sklearn.model_selection import cross_val_score
+    def objective(trial):
+        model = KNeighborsClassifier(
+            n_neighbors=trial.suggest_int('n_neighbors', 1, 30),
+            weights=trial.suggest_categorical('weights', ['uniform', 'distance']),
+            p=trial.suggest_int('p', 1, 2),
+            algorithm=trial.suggest_categorical('algorithm', ['auto', 'ball_tree', 'kd_tree', 'brute']),
+            leaf_size=trial.suggest_int('leaf_size', 10, 100)
+        )
+        return cross_val_score(model, X_train, y_train, cv=5).mean()
+    study = optuna.create_study(direction='maximize')
+    study.optimize(objective, n_trials=50)
+    ```
+    """)
+elif view == "📌 About KNN Regression":
+    st.header("📊 KNN Regression")
+    st.markdown("""
+    KNN Regression predicts **continuous values** by averaging the target values of the nearest neighbors.
+    ### 🔍 How KNN Regression Works:
+    1. Choose a value for **K**.
+    2. Calculate the distance between the input and training samples.
+    3. Pick K nearest data points.
+    4. Predict the output as the **mean (or weighted mean)** of neighbors' target values.
+    ### ✅ Example Use Cases:
+    - House price prediction
+    - Forecasting temperature or humidity
+    - Predicting sales or stock values
+    ### 🔧 Important Parameters (same as Classification):
+    - **`n_neighbors`**: Number of neighbors used in averaging
+    - **`weights`**:
+      - `'uniform'`: Equal contribution
+      - `'distance'`: Nearer points have more weight
+    - **`algorithm`**: Nearest neighbor search strategy
+    - **`leaf_size`**: Tree-based algorithm speed control
+    - **`p`**: Distance metric power
+    - **`metric`**: Type of distance
+    - **`n_jobs`**: CPU cores to use
+    ### 📊 Pros:
+    - Non-linear model without needing equations
+    - Flexible and intuitive
+    ### ⚠️ Cons:
+    - Sensitive to irrelevant features
+    - Slow prediction time for large datasets
+    ### 🔎 Hyperparameter Tuning with Optuna (Concept Only):
+    ```python
+    import optuna
+    from sklearn.neighbors import KNeighborsRegressor
+    from sklearn.model_selection import cross_val_score
+    def objective(trial):
+        model = KNeighborsRegressor(
+            n_neighbors=trial.suggest_int('n_neighbors', 1, 30),
+            weights=trial.suggest_categorical('weights', ['uniform', 'distance']),
+            p=trial.suggest_int('p', 1, 2),
+            algorithm=trial.suggest_categorical('algorithm', ['auto', 'ball_tree', 'kd_tree', 'brute']),
+            leaf_size=trial.suggest_int('leaf_size', 10, 100)
+        )
+        return cross_val_score(model, X_train, y_train, cv=5, scoring='neg_mean_squared_error').mean()
+    study = optuna.create_study(direction='maximize')
+    study.optimize(objective, n_trials=50)
+    ```
+    """)
+st.markdown("""
+---
+## 🔁 Classification vs Regression Summary
+| Feature             | KNN Classification                | KNN Regression                     |
+|--------------------|-----------------------------------|------------------------------------|
+| Output Type        | Class label (categorical)         | Numeric value (continuous)         |
+| Decision Mechanism | Majority vote                     | Mean of neighbors                  |
+| Metrics            | Accuracy, F1, ROC-AUC             | RMSE, MAE, R² Score                |
+---
+## 🧪 Tips for Both Models
+- **Always scale your features** using StandardScaler or MinMaxScaler.
+- **Use Optuna/GridSearchCV** for tuning hyperparameters.
+- **Use PCA or feature selection** in high-dimensional data.
+- **Remove noise and irrelevant features** before training.
+- **Use smaller datasets** or fast search algorithms for scalability.
+---
+✨ KNN is a solid, interpretable, and powerful lazy-learning algorithm — especially when tuned correctly!
+""")