Spaces:
Sleeping
Sleeping
| import streamlit as st | |
| st.title(":red[**Introduction to Ensemble Learning**]") | |
| st.markdown(""" | |
| **Ensemble Learning** is a machine learning technique where **multiple models** (often called "learners") are combined to **solve the same problem**. | |
| The idea is that a **group of models** can outperform any individual model by: | |
| - **Reducing variance** (overfitting), | |
| - **Reducing bias** (underfitting), | |
| - **Improving prediction accuracy**. | |
| --- | |
| ### Why Use Ensemble Methods? | |
| - Improves performance and stability. | |
| - Reduces the risk of overfitting. | |
| - Works well in both classification and regression tasks. | |
| - Often wins data science competitions (e.g., Kaggle). | |
| --- | |
| ### Common Ensemble Techniques | |
| 1. **Bagging** (Bootstrap Aggregating) | |
| - Builds multiple models in parallel. | |
| - Reduces **variance**. | |
| - Example: `RandomForest` | |
| 2. **Boosting** | |
| - Builds models sequentially, each correcting errors from the previous. | |
| - Reduces **bias**. | |
| - Examples: `AdaBoost`, `GradientBoosting`, `XGBoost`, `LightGBM` | |
| 3. **Stacking** | |
| - Combines different model types. | |
| - A meta-model learns how to best combine them. | |
| --- | |
| ### Real-World Examples | |
| - **Random Forest**: A popular bagging method using decision trees. | |
| - **XGBoost / LightGBM**: Powerful boosting frameworks used in competitions. | |
| - **Voting Classifier**: Combines different models (e.g., SVM + Logistic Regression + Decision Tree). | |
| --- | |
| **In short:** Ensemble learning = smarter models by working together | |
| """) | |
| st.subheader(":blue[**Voting Ensemble (Classifier)**]") | |
| st.markdown(""" | |
| In **ensemble learning**, a **Voting Classifier** combines predictions from multiple different models to make a **final decision**. | |
| --- | |
| ### Types of Voting: | |
| #### Hard Voting | |
| - Each model votes for a class label. | |
| - The final prediction is the **majority vote**. | |
| - Useful when all models are equally good. | |
| #### Soft Voting | |
| - Uses **predicted probabilities** from models. | |
| - Averages probabilities and picks the class with the **highest average probability**. | |
| - Works best when base models are **well-calibrated**. | |
| --- | |
| ### Why Use Voting? | |
| - Combines **strengths** of different models. | |
| - Reduces the **risk of overfitting**. | |
| - Often **improves accuracy** over individual models. | |
| """) | |
| st.subheader(":blue[**Bagging Algorithm (Bootstrap Aggregating)**]") | |
| st.markdown(""" | |
| **Bagging** (short for **Bootstrap Aggregating**) is an ensemble learning method that aims to improve the stability and accuracy of machine learning algorithms. | |
| It reduces **variance** and helps to **avoid overfitting**, especially for high-variance models like Decision Trees. | |
| --- | |
| ### How It Works: | |
| 1. Create **multiple subsets** of the original training dataset using **bootstrapping** (random sampling with replacement). | |
| 2. Train a **separate model** on each subset. | |
| 3. Aggregate the predictions: | |
| - For **classification**: majority vote. | |
| - For **regression**: average. | |
| --- | |
| ### Key Points: | |
| - Models are trained **independently and in parallel**. | |
| - Often used with **Decision Trees**. | |
| - Final prediction is **more robust** than any individual model. | |
| --- | |
| ### Example: | |
| A well-known example of Bagging is the **Random Forest** algorithm: | |
| - Uses multiple decision trees trained on bootstrapped samples. | |
| - Adds feature randomness for further diversity. | |
| """) | |
| st.title("What is Random Forest?") | |
| st.markdown(""" | |
| **Random Forest** is a popular **ensemble learning** algorithm that combines the power of **multiple decision trees** to make more accurate and robust predictions. | |
| It is based on the **Bagging** technique and introduces **randomness** at two levels: | |
| - Random sampling of data (bootstrap samples). | |
| - Random subset of features for splitting at each node. | |
| --- | |
| ### How It Works: | |
| 1. **Bootstrap sampling**: Random subsets of the training data are created with replacement. | |
| 2. **Train multiple Decision Trees** on different subsets. | |
| 3. Each tree makes a prediction. | |
| 4. The final output is: | |
| - **Majority vote** (for classification). | |
| - **Average prediction** (for regression). | |
| --- | |
| ### Key Benefits: | |
| - Handles **high-dimensional** data well. | |
| - Reduces **overfitting** (more than a single Decision Tree). | |
| - Works for both **classification** and **regression** tasks. | |
| - **Feature importance** is easy to extract. | |
| --- | |
| ### Real-Life Analogy: | |
| Imagine asking a **group of experts** instead of one person β each tree gives their opinion, and the forest makes the final decision based on consensus! | |
| """) | |
| st.subheader(":blue[**Random Forest: Bagging Ensemble**]") | |
| st.markdown(""" | |
| **Random Forest** is a powerful ensemble algorithm that uses the **Bagging (Bootstrap Aggregating)** technique with an added twist: | |
| --- | |
| ### Bagging Recap: | |
| - **Bagging** creates multiple models (like decision trees) trained on **random subsets** of the data (with replacement). | |
| - Final prediction is made by **aggregating** outputs from all models: | |
| - Majority vote (Classification) | |
| - Average (Regression) | |
| --- | |
| ### What Makes Random Forest Special? | |
| β Uses **Bagging** to build multiple Decision Trees | |
| β Adds **randomness in feature selection** at each split in a tree | |
| β Helps make each tree **less correlated** β more powerful ensemble | |
| --- | |
| ### How Random Forest Works: | |
| 1. Create many bootstrap samples from the training data. | |
| 2. Train a **Decision Tree** on each sample. | |
| 3. At each split in the tree, only consider a **random subset of features**. | |
| 4. Combine all trees: | |
| - For classification β **Majority voting** | |
| - For regression β **Averaging** | |
| --- | |
| ### Why Random Forest Works Well: | |
| - Handles **high-dimensional** data. | |
| - Reduces **variance** and **overfitting**. | |
| - More stable than individual decision trees. | |
| """) | |
| st.subheader(":blue[**Bagging Algorithm in Random Forest**]") | |
| st.markdown(""" | |
| ### π§Ί What is Bagging? | |
| **Bagging** (Bootstrap Aggregating) is an ensemble technique that: | |
| - Trains multiple models on **random samples** of the data (with replacement). | |
| - Aggregates the predictions to make the final decision. | |
| - **Classification** β Majority vote | |
| - **Regression** β Average | |
| --- | |
| ### How Random Forest Uses Bagging: | |
| **Random Forest = Bagging + Random Feature Selection** | |
| #### Here's what happens: | |
| 1. It builds **many decision trees** using **bootstrapped datasets** (Bagging). | |
| 2. When splitting a node, it uses a **random subset of features**. | |
| 3. It aggregates the predictions of all trees. | |
| This makes Random Forest **more diverse**, **less correlated**, and **more accurate** than basic bagging with full-feature trees. | |
| --- | |
| ### Why Bagging Helps Random Forest: | |
| - Reduces **overfitting** by combining diverse learners. | |
| - Lowers **variance** of predictions. | |
| - Makes the model **robust and stable**. | |
| """) | |
| st.subheader(":blue[**Bagging Ensemble for Classification & Regression**]") | |
| st.markdown(""" | |
| ### What is Bagging? | |
| **Bagging** (Bootstrap Aggregating) is an ensemble method that trains multiple base models on **randomly drawn subsets** (with replacement) of the training data, and then **combines** their predictions. | |
| --- | |
| ### For Classification: | |
| - Uses a **voting mechanism**: | |
| - Each model votes for a class. | |
| - The final prediction is the **majority class**. | |
| #### Advantages: | |
| - Reduces **overfitting** | |
| - Decreases **variance** | |
| - Works well with **unstable learners** like Decision Trees | |
| --- | |
| ### For Regression: | |
| - Uses **averaging**: | |
| - Each model makes a numerical prediction. | |
| - The final output is the **average** of all predictions. | |
| #### Benefits: | |
| - Produces **smoother** predictions | |
| - Helps with **noisy datasets** | |
| - Improves **model generalization** | |
| --- | |
| ### Common Base Estimator: | |
| - `DecisionTreeClassifier` for classification | |
| - `DecisionTreeRegressor` for regression | |
| Scikit-learnβs `BaggingClassifier` and `BaggingRegressor` are often used. | |
| """) | |