Spaces:
Sleeping
Sleeping
import streamlit as st | |
st.title(":red[**Introduction to Ensemble Learning**]") | |
st.markdown(""" | |
**Ensemble Learning** is a machine learning technique where **multiple models** (often called "learners") are combined to **solve the same problem**. | |
The idea is that a **group of models** can outperform any individual model by: | |
- **Reducing variance** (overfitting), | |
- **Reducing bias** (underfitting), | |
- **Improving prediction accuracy**. | |
--- | |
### Why Use Ensemble Methods? | |
- Improves performance and stability. | |
- Reduces the risk of overfitting. | |
- Works well in both classification and regression tasks. | |
- Often wins data science competitions (e.g., Kaggle). | |
--- | |
### Common Ensemble Techniques | |
1. **Bagging** (Bootstrap Aggregating) | |
- Builds multiple models in parallel. | |
- Reduces **variance**. | |
- Example: `RandomForest` | |
2. **Boosting** | |
- Builds models sequentially, each correcting errors from the previous. | |
- Reduces **bias**. | |
- Examples: `AdaBoost`, `GradientBoosting`, `XGBoost`, `LightGBM` | |
3. **Stacking** | |
- Combines different model types. | |
- A meta-model learns how to best combine them. | |
--- | |
### Real-World Examples | |
- **Random Forest**: A popular bagging method using decision trees. | |
- **XGBoost / LightGBM**: Powerful boosting frameworks used in competitions. | |
- **Voting Classifier**: Combines different models (e.g., SVM + Logistic Regression + Decision Tree). | |
--- | |
**In short:** Ensemble learning = smarter models by working together | |
""") | |
st.subheader(":blue[**Voting Ensemble (Classifier)**]") | |
st.markdown(""" | |
In **ensemble learning**, a **Voting Classifier** combines predictions from multiple different models to make a **final decision**. | |
--- | |
### Types of Voting: | |
#### Hard Voting | |
- Each model votes for a class label. | |
- The final prediction is the **majority vote**. | |
- Useful when all models are equally good. | |
#### Soft Voting | |
- Uses **predicted probabilities** from models. | |
- Averages probabilities and picks the class with the **highest average probability**. | |
- Works best when base models are **well-calibrated**. | |
--- | |
### Why Use Voting? | |
- Combines **strengths** of different models. | |
- Reduces the **risk of overfitting**. | |
- Often **improves accuracy** over individual models. | |
""") | |
st.subheader(":blue[**Bagging Algorithm (Bootstrap Aggregating)**]") | |
st.markdown(""" | |
**Bagging** (short for **Bootstrap Aggregating**) is an ensemble learning method that aims to improve the stability and accuracy of machine learning algorithms. | |
It reduces **variance** and helps to **avoid overfitting**, especially for high-variance models like Decision Trees. | |
--- | |
### How It Works: | |
1. Create **multiple subsets** of the original training dataset using **bootstrapping** (random sampling with replacement). | |
2. Train a **separate model** on each subset. | |
3. Aggregate the predictions: | |
- For **classification**: majority vote. | |
- For **regression**: average. | |
--- | |
### Key Points: | |
- Models are trained **independently and in parallel**. | |
- Often used with **Decision Trees**. | |
- Final prediction is **more robust** than any individual model. | |
--- | |
### Example: | |
A well-known example of Bagging is the **Random Forest** algorithm: | |
- Uses multiple decision trees trained on bootstrapped samples. | |
- Adds feature randomness for further diversity. | |
""") | |
st.title("What is Random Forest?") | |
st.markdown(""" | |
**Random Forest** is a popular **ensemble learning** algorithm that combines the power of **multiple decision trees** to make more accurate and robust predictions. | |
It is based on the **Bagging** technique and introduces **randomness** at two levels: | |
- Random sampling of data (bootstrap samples). | |
- Random subset of features for splitting at each node. | |
--- | |
### How It Works: | |
1. **Bootstrap sampling**: Random subsets of the training data are created with replacement. | |
2. **Train multiple Decision Trees** on different subsets. | |
3. Each tree makes a prediction. | |
4. The final output is: | |
- **Majority vote** (for classification). | |
- **Average prediction** (for regression). | |
--- | |
### Key Benefits: | |
- Handles **high-dimensional** data well. | |
- Reduces **overfitting** (more than a single Decision Tree). | |
- Works for both **classification** and **regression** tasks. | |
- **Feature importance** is easy to extract. | |
--- | |
### Real-Life Analogy: | |
Imagine asking a **group of experts** instead of one person β each tree gives their opinion, and the forest makes the final decision based on consensus! | |
""") | |
st.subheader(":blue[**Random Forest: Bagging Ensemble**]") | |
st.markdown(""" | |
**Random Forest** is a powerful ensemble algorithm that uses the **Bagging (Bootstrap Aggregating)** technique with an added twist: | |
--- | |
### Bagging Recap: | |
- **Bagging** creates multiple models (like decision trees) trained on **random subsets** of the data (with replacement). | |
- Final prediction is made by **aggregating** outputs from all models: | |
- Majority vote (Classification) | |
- Average (Regression) | |
--- | |
### What Makes Random Forest Special? | |
β Uses **Bagging** to build multiple Decision Trees | |
β Adds **randomness in feature selection** at each split in a tree | |
β Helps make each tree **less correlated** β more powerful ensemble | |
--- | |
### How Random Forest Works: | |
1. Create many bootstrap samples from the training data. | |
2. Train a **Decision Tree** on each sample. | |
3. At each split in the tree, only consider a **random subset of features**. | |
4. Combine all trees: | |
- For classification β **Majority voting** | |
- For regression β **Averaging** | |
--- | |
### Why Random Forest Works Well: | |
- Handles **high-dimensional** data. | |
- Reduces **variance** and **overfitting**. | |
- More stable than individual decision trees. | |
""") | |
st.subheader(":blue[**Bagging Algorithm in Random Forest**]") | |
st.markdown(""" | |
### π§Ί What is Bagging? | |
**Bagging** (Bootstrap Aggregating) is an ensemble technique that: | |
- Trains multiple models on **random samples** of the data (with replacement). | |
- Aggregates the predictions to make the final decision. | |
- **Classification** β Majority vote | |
- **Regression** β Average | |
--- | |
### How Random Forest Uses Bagging: | |
**Random Forest = Bagging + Random Feature Selection** | |
#### Here's what happens: | |
1. It builds **many decision trees** using **bootstrapped datasets** (Bagging). | |
2. When splitting a node, it uses a **random subset of features**. | |
3. It aggregates the predictions of all trees. | |
This makes Random Forest **more diverse**, **less correlated**, and **more accurate** than basic bagging with full-feature trees. | |
--- | |
### Why Bagging Helps Random Forest: | |
- Reduces **overfitting** by combining diverse learners. | |
- Lowers **variance** of predictions. | |
- Makes the model **robust and stable**. | |
""") | |
st.subheader(":blue[**Bagging Ensemble for Classification & Regression**]") | |
st.markdown(""" | |
### What is Bagging? | |
**Bagging** (Bootstrap Aggregating) is an ensemble method that trains multiple base models on **randomly drawn subsets** (with replacement) of the training data, and then **combines** their predictions. | |
--- | |
### For Classification: | |
- Uses a **voting mechanism**: | |
- Each model votes for a class. | |
- The final prediction is the **majority class**. | |
#### Advantages: | |
- Reduces **overfitting** | |
- Decreases **variance** | |
- Works well with **unstable learners** like Decision Trees | |
--- | |
### For Regression: | |
- Uses **averaging**: | |
- Each model makes a numerical prediction. | |
- The final output is the **average** of all predictions. | |
#### Benefits: | |
- Produces **smoother** predictions | |
- Helps with **noisy datasets** | |
- Improves **model generalization** | |
--- | |
### Common Base Estimator: | |
- `DecisionTreeClassifier` for classification | |
- `DecisionTreeRegressor` for regression | |
Scikit-learnβs `BaggingClassifier` and `BaggingRegressor` are often used. | |
""") | |