pranayreddy316 commited on
Commit
d00f605
·
verified ·
1 Parent(s): 5905397

Upload The KNN_Algorithm.py

Browse files
Files changed (1) hide show
  1. pages/The KNN_Algorithm.py +152 -0
pages/The KNN_Algorithm.py ADDED
@@ -0,0 +1,152 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+
3
+ st.set_page_config(page_title="KNN: Classification & Regression")
4
+
5
+ st.title("📘 Understanding K-Nearest Neighbors (KNN)")
6
+
7
+ # Button-like radio selector
8
+ view = st.radio("🧪 Select Topic Mode", ["📌 About KNN Classification", "📌 About KNN Regression"], horizontal=True)
9
+
10
+ if view == "📌 About KNN Classification":
11
+ st.header("🧠 KNN Classification")
12
+
13
+ st.markdown("""
14
+ KNN Classification is a **supervised learning algorithm** used to classify data points based on the majority class of their nearest neighbors.
15
+
16
+ ### 🔍 How KNN Classification Works:
17
+ 1. Select a value for **K** (number of nearest neighbors).
18
+ 2. Calculate the **distance** between the query point and training points (using a chosen metric).
19
+ 3. Identify **K nearest neighbors**.
20
+ 4. Assign the **most frequent class** among those neighbors to the query point.
21
+
22
+ ### ✅ Example Use Cases:
23
+ - Spam vs. non-spam classification
24
+ - Image recognition (e.g. handwritten digits)
25
+ - Classifying types of flowers (e.g., Iris dataset)
26
+
27
+ ### 🔧 Important Parameters in KNN Classifier:
28
+ - **`n_neighbors`**: Number of neighbors to use. Small K may lead to overfitting, large K may underfit.
29
+ - **`weights`**: Determines how neighbors contribute to the prediction:
30
+ - `'uniform'`: All neighbors have equal influence.
31
+ - `'distance'`: Closer neighbors contribute more.
32
+ - **`algorithm`**: Strategy to compute nearest neighbors:
33
+ - `'auto'`, `'ball_tree'`, `'kd_tree'`, `'brute'`
34
+ - **`leaf_size`**: Affects tree-based search speed.
35
+ - **`p`**: Power parameter for Minkowski distance:
36
+ - `p=1`: Manhattan
37
+ - `p=2`: Euclidean (default)
38
+ - **`metric`**: Type of distance metric, e.g. `'minkowski'`, `'cosine'`, `'hamming'`
39
+ - **`n_jobs`**: Number of CPU cores to use. `-1` means use all available cores.
40
+
41
+ ### 📊 Pros:
42
+ - Simple and interpretable
43
+ - No training phase
44
+ - Good for small datasets
45
+
46
+ ### ⚠️ Cons:
47
+ - Slow for large datasets
48
+ - Requires scaling
49
+ - Sensitive to irrelevant features and noise
50
+
51
+ ### 🔎 Hyperparameter Tuning with Optuna (Concept Only):
52
+ ```python
53
+ import optuna
54
+ from sklearn.neighbors import KNeighborsClassifier
55
+ from sklearn.model_selection import cross_val_score
56
+
57
+ def objective(trial):
58
+ model = KNeighborsClassifier(
59
+ n_neighbors=trial.suggest_int('n_neighbors', 1, 30),
60
+ weights=trial.suggest_categorical('weights', ['uniform', 'distance']),
61
+ p=trial.suggest_int('p', 1, 2),
62
+ algorithm=trial.suggest_categorical('algorithm', ['auto', 'ball_tree', 'kd_tree', 'brute']),
63
+ leaf_size=trial.suggest_int('leaf_size', 10, 100)
64
+ )
65
+ return cross_val_score(model, X_train, y_train, cv=5).mean()
66
+
67
+ study = optuna.create_study(direction='maximize')
68
+ study.optimize(objective, n_trials=50)
69
+ ```
70
+ """)
71
+
72
+ elif view == "📌 About KNN Regression":
73
+ st.header("📊 KNN Regression")
74
+
75
+ st.markdown("""
76
+ KNN Regression predicts **continuous values** by averaging the target values of the nearest neighbors.
77
+
78
+ ### 🔍 How KNN Regression Works:
79
+ 1. Choose a value for **K**.
80
+ 2. Calculate the distance between the input and training samples.
81
+ 3. Pick K nearest data points.
82
+ 4. Predict the output as the **mean (or weighted mean)** of neighbors' target values.
83
+
84
+ ### ✅ Example Use Cases:
85
+ - House price prediction
86
+ - Forecasting temperature or humidity
87
+ - Predicting sales or stock values
88
+
89
+ ### 🔧 Important Parameters (same as Classification):
90
+ - **`n_neighbors`**: Number of neighbors used in averaging
91
+ - **`weights`**:
92
+ - `'uniform'`: Equal contribution
93
+ - `'distance'`: Nearer points have more weight
94
+ - **`algorithm`**: Nearest neighbor search strategy
95
+ - **`leaf_size`**: Tree-based algorithm speed control
96
+ - **`p`**: Distance metric power
97
+ - **`metric`**: Type of distance
98
+ - **`n_jobs`**: CPU cores to use
99
+
100
+ ### 📊 Pros:
101
+ - Non-linear model without needing equations
102
+ - Flexible and intuitive
103
+
104
+ ### ⚠️ Cons:
105
+ - Sensitive to irrelevant features
106
+ - Slow prediction time for large datasets
107
+
108
+ ### 🔎 Hyperparameter Tuning with Optuna (Concept Only):
109
+ ```python
110
+ import optuna
111
+ from sklearn.neighbors import KNeighborsRegressor
112
+ from sklearn.model_selection import cross_val_score
113
+
114
+ def objective(trial):
115
+ model = KNeighborsRegressor(
116
+ n_neighbors=trial.suggest_int('n_neighbors', 1, 30),
117
+ weights=trial.suggest_categorical('weights', ['uniform', 'distance']),
118
+ p=trial.suggest_int('p', 1, 2),
119
+ algorithm=trial.suggest_categorical('algorithm', ['auto', 'ball_tree', 'kd_tree', 'brute']),
120
+ leaf_size=trial.suggest_int('leaf_size', 10, 100)
121
+ )
122
+ return cross_val_score(model, X_train, y_train, cv=5, scoring='neg_mean_squared_error').mean()
123
+
124
+ study = optuna.create_study(direction='maximize')
125
+ study.optimize(objective, n_trials=50)
126
+ ```
127
+ """)
128
+
129
+ st.markdown("""
130
+ ---
131
+
132
+ ## 🔁 Classification vs Regression Summary
133
+
134
+ | Feature | KNN Classification | KNN Regression |
135
+ |--------------------|-----------------------------------|------------------------------------|
136
+ | Output Type | Class label (categorical) | Numeric value (continuous) |
137
+ | Decision Mechanism | Majority vote | Mean of neighbors |
138
+ | Metrics | Accuracy, F1, ROC-AUC | RMSE, MAE, R² Score |
139
+
140
+ ---
141
+
142
+ ## 🧪 Tips for Both Models
143
+ - **Always scale your features** using StandardScaler or MinMaxScaler.
144
+ - **Use Optuna/GridSearchCV** for tuning hyperparameters.
145
+ - **Use PCA or feature selection** in high-dimensional data.
146
+ - **Remove noise and irrelevant features** before training.
147
+ - **Use smaller datasets** or fast search algorithms for scalability.
148
+
149
+ ---
150
+
151
+ ✨ KNN is a solid, interpretable, and powerful lazy-learning algorithm — especially when tuned correctly!
152
+ """)