Spaces:

pranayreddy316
/

Zero_to_Hero_in_Machine_Learning

Build error

App Files Files Community

pranayreddy316 commited on Apr 7

Commit

152759e

verified ·

1 Parent(s): d00f605

Upload The_Decision_Tree_Algorithm.py

Browse files

Files changed (1) hide show

pages/The_Decision_Tree_Algorithm.py +154 -0

pages/The_Decision_Tree_Algorithm.py ADDED Viewed

	@@ -0,0 +1,154 @@

+import streamlit as st
+st.set_page_config(page_title="Decision Tree: Classification & Regression")
+st.title("🌳 Understanding Decision Tree Algorithms")
+# Button-like radio selector
+view = st.radio("🧪 Select Topic Mode", ["📌 About Decision Tree Classification", "📌 About Decision Tree Regression"], horizontal=True)
+if view == "📌 About Decision Tree Classification":
+    st.header("🧠 Decision Tree Classification")
+    st.markdown("""
+    Decision Tree Classification is a **supervised learning algorithm** used for classifying data by learning simple decision rules inferred from the data features.
+    ### 🔍 How Decision Tree Classification Works:
+    1. Select the **best feature** to split the dataset based on a criterion (e.g., Gini impurity or entropy).
+    2. Split the dataset into subsets.
+    3. Repeat recursively for each subset until the stopping condition is met (e.g., max depth, minimum samples).
+    ### ✅ Example Use Cases:
+    - Medical diagnosis (disease prediction)
+    - Customer churn prediction
+    - Loan approval classification
+    ### 🔧 Important Parameters in `DecisionTreeClassifier`:
+    - **`criterion`**: Metric to evaluate the quality of a split:
+      - `'gini'`: Gini Impurity (default)
+      - `'entropy'`: Information Gain
+    - **`max_depth`**: Maximum depth of the tree
+    - **`min_samples_split`**: Minimum number of samples to split an internal node
+    - **`min_samples_leaf`**: Minimum number of samples in a leaf node
+    - **`max_features`**: Number of features to consider when looking for the best split
+    - **`random_state`**: Controls the randomness of the estimator
+    ### 📊 Pros:
+    - Easy to understand and interpret
+    - Requires little data preparation
+    - Handles both numerical and categorical data
+    ### ⚠️ Cons:
+    - Prone to overfitting
+    - Unstable with small data changes
+    - Biased with imbalanced datasets
+    ### 🔎 Hyperparameter Tuning with Optuna (Example with Iris Dataset):
+    ```python
+    import optuna
+    from sklearn.datasets import load_iris
+    from sklearn.model_selection import train_test_split, cross_val_score
+    from sklearn.tree import DecisionTreeClassifier
+    # Load and split data
+    data = load_iris()
+    X_train, _, y_train, _ = train_test_split(data.data, data.target, test_size=0.2, random_state=42)
+    def objective(trial):
+        model = DecisionTreeClassifier(
+            criterion=trial.suggest_categorical('criterion', ['gini', 'entropy']),
+            max_depth=trial.suggest_int('max_depth', 1, 20),
+            min_samples_split=trial.suggest_int('min_samples_split', 2, 20),
+            min_samples_leaf=trial.suggest_int('min_samples_leaf', 1, 20),
+            max_features=trial.suggest_categorical('max_features', ['auto', 'sqrt', 'log2', None]),
+            random_state=42
+        )
+        return cross_val_score(model, X_train, y_train, cv=5).mean()
+    study = optuna.create_study(direction='maximize')
+    study.optimize(objective, n_trials=50)
+    ```
+    """)
+elif view == "📌 About Decision Tree Regression":
+    st.header("📊 Decision Tree Regression")
+    st.markdown("""
+    Decision Tree Regression is used for **predicting continuous values** by learning decision rules that split data to minimize variance.
+    ### 🔍 How Decision Tree Regression Works:
+    1. The dataset is split based on a feature that minimizes variance (MSE).
+    2. These splits form decision nodes and terminal leaves with average target values.
+    3. Recursively build the tree until the stopping condition is reached.
+    ### ✅ Example Use Cases:
+    - Predicting housing prices
+    - Estimating demand or consumption
+    - Financial forecasting
+    ### 🔧 Important Parameters in `DecisionTreeRegressor`:
+    - **`criterion`**: Function to measure the quality of a split:
+      - `'squared_error'`, `'absolute_error'`, `'friedman_mse'`, `'poisson'`
+    - **`max_depth`**, **`min_samples_split`**, **`min_samples_leaf`**, **`max_features`**, **`random_state`** (same as classifier)
+    ### 📊 Pros:
+    - Simple and non-linear relationships can be captured
+    - No need for feature scaling
+    - Flexible to feature types
+    ### ⚠️ Cons:
+    - Overfitting is common without pruning
+    - Sensitive to outliers and noise
+    ### 🔎 Hyperparameter Tuning with Optuna (Example with California Housing Dataset):
+    ```python
+    import optuna
+    from sklearn.datasets import fetch_california_housing
+    from sklearn.model_selection import train_test_split, cross_val_score
+    from sklearn.tree import DecisionTreeRegressor
+    # Load and split data
+    data = fetch_california_housing()
+    X_train, _, y_train, _ = train_test_split(data.data, data.target, test_size=0.2, random_state=42)
+    def objective(trial):
+        model = DecisionTreeRegressor(
+            criterion=trial.suggest_categorical('criterion', ['squared_error', 'friedman_mse', 'absolute_error']),
+            max_depth=trial.suggest_int('max_depth', 1, 20),
+            min_samples_split=trial.suggest_int('min_samples_split', 2, 20),
+            min_samples_leaf=trial.suggest_int('min_samples_leaf', 1, 20),
+            max_features=trial.suggest_categorical('max_features', ['auto', 'sqrt', 'log2', None]),
+            random_state=42
+        )
+        return cross_val_score(model, X_train, y_train, cv=5, scoring='neg_mean_squared_error').mean()
+    study = optuna.create_study(direction='maximize')
+    study.optimize(objective, n_trials=50)
+    ```
+    """)
+st.markdown("""
+---
+## 🧠 Classification vs Regression Summary
+| Feature             | Decision Tree Classification     | Decision Tree Regression           |
+|--------------------|----------------------------------|------------------------------------|
+| Output Type        | Class label (categorical)        | Numeric value (continuous)         |
+| Decision Mechanism | Gini/Entropy-based Splits         | MSE/MAE-based Splits               |
+| Metrics            | Accuracy, F1, ROC-AUC            | RMSE, MAE, R² Score                |
+---
+## 🧪 Tips for Both Models
+- **Prune your tree** with max depth or minimum samples.
+- **Use GridSearchCV or Optuna** for hyperparameter tuning.
+- **Be cautious of overfitting**, especially on small datasets.
+- **Visualize trees** with `plot_tree` from `sklearn.tree`.
+---
+✨ Decision Trees offer an intuitive, powerful approach to both classification and regression — especially when tuned and interpreted effectively!
+""")