pranayreddy316 commited on
Commit
152759e
·
verified ·
1 Parent(s): d00f605

Upload The_Decision_Tree_Algorithm.py

Browse files
Files changed (1) hide show
  1. pages/The_Decision_Tree_Algorithm.py +154 -0
pages/The_Decision_Tree_Algorithm.py ADDED
@@ -0,0 +1,154 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+
3
+ st.set_page_config(page_title="Decision Tree: Classification & Regression")
4
+
5
+ st.title("🌳 Understanding Decision Tree Algorithms")
6
+
7
+ # Button-like radio selector
8
+ view = st.radio("🧪 Select Topic Mode", ["📌 About Decision Tree Classification", "📌 About Decision Tree Regression"], horizontal=True)
9
+
10
+ if view == "📌 About Decision Tree Classification":
11
+ st.header("🧠 Decision Tree Classification")
12
+
13
+ st.markdown("""
14
+ Decision Tree Classification is a **supervised learning algorithm** used for classifying data by learning simple decision rules inferred from the data features.
15
+
16
+ ### 🔍 How Decision Tree Classification Works:
17
+ 1. Select the **best feature** to split the dataset based on a criterion (e.g., Gini impurity or entropy).
18
+ 2. Split the dataset into subsets.
19
+ 3. Repeat recursively for each subset until the stopping condition is met (e.g., max depth, minimum samples).
20
+
21
+ ### ✅ Example Use Cases:
22
+ - Medical diagnosis (disease prediction)
23
+ - Customer churn prediction
24
+ - Loan approval classification
25
+
26
+ ### 🔧 Important Parameters in `DecisionTreeClassifier`:
27
+ - **`criterion`**: Metric to evaluate the quality of a split:
28
+ - `'gini'`: Gini Impurity (default)
29
+ - `'entropy'`: Information Gain
30
+ - **`max_depth`**: Maximum depth of the tree
31
+ - **`min_samples_split`**: Minimum number of samples to split an internal node
32
+ - **`min_samples_leaf`**: Minimum number of samples in a leaf node
33
+ - **`max_features`**: Number of features to consider when looking for the best split
34
+ - **`random_state`**: Controls the randomness of the estimator
35
+
36
+ ### 📊 Pros:
37
+ - Easy to understand and interpret
38
+ - Requires little data preparation
39
+ - Handles both numerical and categorical data
40
+
41
+ ### ⚠️ Cons:
42
+ - Prone to overfitting
43
+ - Unstable with small data changes
44
+ - Biased with imbalanced datasets
45
+
46
+ ### 🔎 Hyperparameter Tuning with Optuna (Example with Iris Dataset):
47
+
48
+ ```python
49
+ import optuna
50
+ from sklearn.datasets import load_iris
51
+ from sklearn.model_selection import train_test_split, cross_val_score
52
+ from sklearn.tree import DecisionTreeClassifier
53
+
54
+ # Load and split data
55
+ data = load_iris()
56
+ X_train, _, y_train, _ = train_test_split(data.data, data.target, test_size=0.2, random_state=42)
57
+
58
+ def objective(trial):
59
+ model = DecisionTreeClassifier(
60
+ criterion=trial.suggest_categorical('criterion', ['gini', 'entropy']),
61
+ max_depth=trial.suggest_int('max_depth', 1, 20),
62
+ min_samples_split=trial.suggest_int('min_samples_split', 2, 20),
63
+ min_samples_leaf=trial.suggest_int('min_samples_leaf', 1, 20),
64
+ max_features=trial.suggest_categorical('max_features', ['auto', 'sqrt', 'log2', None]),
65
+ random_state=42
66
+ )
67
+ return cross_val_score(model, X_train, y_train, cv=5).mean()
68
+
69
+ study = optuna.create_study(direction='maximize')
70
+ study.optimize(objective, n_trials=50)
71
+ ```
72
+ """)
73
+
74
+ elif view == "📌 About Decision Tree Regression":
75
+ st.header("📊 Decision Tree Regression")
76
+
77
+ st.markdown("""
78
+ Decision Tree Regression is used for **predicting continuous values** by learning decision rules that split data to minimize variance.
79
+
80
+ ### 🔍 How Decision Tree Regression Works:
81
+ 1. The dataset is split based on a feature that minimizes variance (MSE).
82
+ 2. These splits form decision nodes and terminal leaves with average target values.
83
+ 3. Recursively build the tree until the stopping condition is reached.
84
+
85
+ ### ✅ Example Use Cases:
86
+ - Predicting housing prices
87
+ - Estimating demand or consumption
88
+ - Financial forecasting
89
+
90
+ ### 🔧 Important Parameters in `DecisionTreeRegressor`:
91
+ - **`criterion`**: Function to measure the quality of a split:
92
+ - `'squared_error'`, `'absolute_error'`, `'friedman_mse'`, `'poisson'`
93
+ - **`max_depth`**, **`min_samples_split`**, **`min_samples_leaf`**, **`max_features`**, **`random_state`** (same as classifier)
94
+
95
+ ### 📊 Pros:
96
+ - Simple and non-linear relationships can be captured
97
+ - No need for feature scaling
98
+ - Flexible to feature types
99
+
100
+ ### ⚠️ Cons:
101
+ - Overfitting is common without pruning
102
+ - Sensitive to outliers and noise
103
+
104
+ ### 🔎 Hyperparameter Tuning with Optuna (Example with California Housing Dataset):
105
+
106
+ ```python
107
+ import optuna
108
+ from sklearn.datasets import fetch_california_housing
109
+ from sklearn.model_selection import train_test_split, cross_val_score
110
+ from sklearn.tree import DecisionTreeRegressor
111
+
112
+ # Load and split data
113
+ data = fetch_california_housing()
114
+ X_train, _, y_train, _ = train_test_split(data.data, data.target, test_size=0.2, random_state=42)
115
+
116
+ def objective(trial):
117
+ model = DecisionTreeRegressor(
118
+ criterion=trial.suggest_categorical('criterion', ['squared_error', 'friedman_mse', 'absolute_error']),
119
+ max_depth=trial.suggest_int('max_depth', 1, 20),
120
+ min_samples_split=trial.suggest_int('min_samples_split', 2, 20),
121
+ min_samples_leaf=trial.suggest_int('min_samples_leaf', 1, 20),
122
+ max_features=trial.suggest_categorical('max_features', ['auto', 'sqrt', 'log2', None]),
123
+ random_state=42
124
+ )
125
+ return cross_val_score(model, X_train, y_train, cv=5, scoring='neg_mean_squared_error').mean()
126
+
127
+ study = optuna.create_study(direction='maximize')
128
+ study.optimize(objective, n_trials=50)
129
+ ```
130
+ """)
131
+
132
+ st.markdown("""
133
+ ---
134
+
135
+ ## 🧠 Classification vs Regression Summary
136
+
137
+ | Feature | Decision Tree Classification | Decision Tree Regression |
138
+ |--------------------|----------------------------------|------------------------------------|
139
+ | Output Type | Class label (categorical) | Numeric value (continuous) |
140
+ | Decision Mechanism | Gini/Entropy-based Splits | MSE/MAE-based Splits |
141
+ | Metrics | Accuracy, F1, ROC-AUC | RMSE, MAE, R² Score |
142
+
143
+ ---
144
+
145
+ ## 🧪 Tips for Both Models
146
+ - **Prune your tree** with max depth or minimum samples.
147
+ - **Use GridSearchCV or Optuna** for hyperparameter tuning.
148
+ - **Be cautious of overfitting**, especially on small datasets.
149
+ - **Visualize trees** with `plot_tree` from `sklearn.tree`.
150
+
151
+ ---
152
+
153
+ ✨ Decision Trees offer an intuitive, powerful approach to both classification and regression — especially when tuned and interpreted effectively!
154
+ """)