Ensemble / pages /Introduction of Ensemble.py
shwetashweta05's picture
Update pages/Introduction of Ensemble.py
c700107 verified
import streamlit as st
st.title(":red[**Introduction to Ensemble Learning**]")
st.markdown("""
**Ensemble Learning** is a machine learning technique where **multiple models** (often called "learners") are combined to **solve the same problem**.
The idea is that a **group of models** can outperform any individual model by:
- **Reducing variance** (overfitting),
- **Reducing bias** (underfitting),
- **Improving prediction accuracy**.
---
### Why Use Ensemble Methods?
- Improves performance and stability.
- Reduces the risk of overfitting.
- Works well in both classification and regression tasks.
- Often wins data science competitions (e.g., Kaggle).
---
### Common Ensemble Techniques
1. **Bagging** (Bootstrap Aggregating)
- Builds multiple models in parallel.
- Reduces **variance**.
- Example: `RandomForest`
2. **Boosting**
- Builds models sequentially, each correcting errors from the previous.
- Reduces **bias**.
- Examples: `AdaBoost`, `GradientBoosting`, `XGBoost`, `LightGBM`
3. **Stacking**
- Combines different model types.
- A meta-model learns how to best combine them.
---
### Real-World Examples
- **Random Forest**: A popular bagging method using decision trees.
- **XGBoost / LightGBM**: Powerful boosting frameworks used in competitions.
- **Voting Classifier**: Combines different models (e.g., SVM + Logistic Regression + Decision Tree).
---
**In short:** Ensemble learning = smarter models by working together
""")
st.subheader(":blue[**Voting Ensemble (Classifier)**]")
st.markdown("""
In **ensemble learning**, a **Voting Classifier** combines predictions from multiple different models to make a **final decision**.
---
### Types of Voting:
#### Hard Voting
- Each model votes for a class label.
- The final prediction is the **majority vote**.
- Useful when all models are equally good.
#### Soft Voting
- Uses **predicted probabilities** from models.
- Averages probabilities and picks the class with the **highest average probability**.
- Works best when base models are **well-calibrated**.
---
### Why Use Voting?
- Combines **strengths** of different models.
- Reduces the **risk of overfitting**.
- Often **improves accuracy** over individual models.
""")
st.subheader(":blue[**Bagging Algorithm (Bootstrap Aggregating)**]")
st.markdown("""
**Bagging** (short for **Bootstrap Aggregating**) is an ensemble learning method that aims to improve the stability and accuracy of machine learning algorithms.
It reduces **variance** and helps to **avoid overfitting**, especially for high-variance models like Decision Trees.
---
### How It Works:
1. Create **multiple subsets** of the original training dataset using **bootstrapping** (random sampling with replacement).
2. Train a **separate model** on each subset.
3. Aggregate the predictions:
- For **classification**: majority vote.
- For **regression**: average.
---
### Key Points:
- Models are trained **independently and in parallel**.
- Often used with **Decision Trees**.
- Final prediction is **more robust** than any individual model.
---
### Example:
A well-known example of Bagging is the **Random Forest** algorithm:
- Uses multiple decision trees trained on bootstrapped samples.
- Adds feature randomness for further diversity.
""")
st.title("What is Random Forest?")
st.markdown("""
**Random Forest** is a popular **ensemble learning** algorithm that combines the power of **multiple decision trees** to make more accurate and robust predictions.
It is based on the **Bagging** technique and introduces **randomness** at two levels:
- Random sampling of data (bootstrap samples).
- Random subset of features for splitting at each node.
---
### How It Works:
1. **Bootstrap sampling**: Random subsets of the training data are created with replacement.
2. **Train multiple Decision Trees** on different subsets.
3. Each tree makes a prediction.
4. The final output is:
- **Majority vote** (for classification).
- **Average prediction** (for regression).
---
### Key Benefits:
- Handles **high-dimensional** data well.
- Reduces **overfitting** (more than a single Decision Tree).
- Works for both **classification** and **regression** tasks.
- **Feature importance** is easy to extract.
---
### Real-Life Analogy:
Imagine asking a **group of experts** instead of one person – each tree gives their opinion, and the forest makes the final decision based on consensus!
""")
st.subheader(":blue[**Random Forest: Bagging Ensemble**]")
st.markdown("""
**Random Forest** is a powerful ensemble algorithm that uses the **Bagging (Bootstrap Aggregating)** technique with an added twist:
---
### Bagging Recap:
- **Bagging** creates multiple models (like decision trees) trained on **random subsets** of the data (with replacement).
- Final prediction is made by **aggregating** outputs from all models:
- Majority vote (Classification)
- Average (Regression)
---
### What Makes Random Forest Special?
βœ… Uses **Bagging** to build multiple Decision Trees
βœ… Adds **randomness in feature selection** at each split in a tree
βœ… Helps make each tree **less correlated** β†’ more powerful ensemble
---
### How Random Forest Works:
1. Create many bootstrap samples from the training data.
2. Train a **Decision Tree** on each sample.
3. At each split in the tree, only consider a **random subset of features**.
4. Combine all trees:
- For classification β†’ **Majority voting**
- For regression β†’ **Averaging**
---
### Why Random Forest Works Well:
- Handles **high-dimensional** data.
- Reduces **variance** and **overfitting**.
- More stable than individual decision trees.
""")
st.subheader(":blue[**Bagging Algorithm in Random Forest**]")
st.markdown("""
### 🧺 What is Bagging?
**Bagging** (Bootstrap Aggregating) is an ensemble technique that:
- Trains multiple models on **random samples** of the data (with replacement).
- Aggregates the predictions to make the final decision.
- **Classification** β†’ Majority vote
- **Regression** β†’ Average
---
### How Random Forest Uses Bagging:
**Random Forest = Bagging + Random Feature Selection**
#### Here's what happens:
1. It builds **many decision trees** using **bootstrapped datasets** (Bagging).
2. When splitting a node, it uses a **random subset of features**.
3. It aggregates the predictions of all trees.
This makes Random Forest **more diverse**, **less correlated**, and **more accurate** than basic bagging with full-feature trees.
---
### Why Bagging Helps Random Forest:
- Reduces **overfitting** by combining diverse learners.
- Lowers **variance** of predictions.
- Makes the model **robust and stable**.
""")
st.subheader(":blue[**Bagging Ensemble for Classification & Regression**]")
st.markdown("""
### What is Bagging?
**Bagging** (Bootstrap Aggregating) is an ensemble method that trains multiple base models on **randomly drawn subsets** (with replacement) of the training data, and then **combines** their predictions.
---
### For Classification:
- Uses a **voting mechanism**:
- Each model votes for a class.
- The final prediction is the **majority class**.
#### Advantages:
- Reduces **overfitting**
- Decreases **variance**
- Works well with **unstable learners** like Decision Trees
---
### For Regression:
- Uses **averaging**:
- Each model makes a numerical prediction.
- The final output is the **average** of all predictions.
#### Benefits:
- Produces **smoother** predictions
- Helps with **noisy datasets**
- Improves **model generalization**
---
### Common Base Estimator:
- `DecisionTreeClassifier` for classification
- `DecisionTreeRegressor` for regression
Scikit-learn’s `BaggingClassifier` and `BaggingRegressor` are often used.
""")