Spaces:

shwetashweta05
/

Ensemble

Sleeping

App Files Files Community

Ensemble / pages /Introduction of Ensemble.py

shwetashweta05

Update pages/Introduction of Ensemble.py

c700107 verified 3 months ago

raw

history blame contribute delete

7.8 kB

	import streamlit as st

	st.title(":red[Introduction to Ensemble Learning]")

	st.markdown("""
	Ensemble Learning is a machine learning technique where multiple models (often called "learners") are combined to solve the same problem.

	The idea is that a group of models can outperform any individual model by:
	- Reducing variance (overfitting),
	- Reducing bias (underfitting),
	- Improving prediction accuracy.

	---

	### Why Use Ensemble Methods?

	- Improves performance and stability.
	- Reduces the risk of overfitting.
	- Works well in both classification and regression tasks.
	- Often wins data science competitions (e.g., Kaggle).

	---

	### Common Ensemble Techniques

	1. Bagging (Bootstrap Aggregating)
	- Builds multiple models in parallel.
	- Reduces variance.
	- Example: `RandomForest`

	2. Boosting
	- Builds models sequentially, each correcting errors from the previous.
	- Reduces bias.
	- Examples: `AdaBoost`, `GradientBoosting`, `XGBoost`, `LightGBM`

	3. Stacking
	- Combines different model types.
	- A meta-model learns how to best combine them.

	---

	### Real-World Examples
	- Random Forest: A popular bagging method using decision trees.
	- XGBoost / LightGBM: Powerful boosting frameworks used in competitions.
	- Voting Classifier: Combines different models (e.g., SVM + Logistic Regression + Decision Tree).

	---

	In short: Ensemble learning = smarter models by working together
	""")


	st.subheader(":blue[Voting Ensemble (Classifier)]")

	st.markdown("""
	In ensemble learning, a Voting Classifier combines predictions from multiple different models to make a final decision.

	---

	### Types of Voting:

	#### Hard Voting
	- Each model votes for a class label.
	- The final prediction is the majority vote.
	- Useful when all models are equally good.

	#### Soft Voting
	- Uses predicted probabilities from models.
	- Averages probabilities and picks the class with the highest average probability.
	- Works best when base models are well-calibrated.

	---

	### Why Use Voting?
	- Combines strengths of different models.
	- Reduces the risk of overfitting.
	- Often improves accuracy over individual models.
	""")

	st.subheader(":blue[Bagging Algorithm (Bootstrap Aggregating)]")

	st.markdown("""
	Bagging (short for Bootstrap Aggregating) is an ensemble learning method that aims to improve the stability and accuracy of machine learning algorithms.

	It reduces variance and helps to avoid overfitting, especially for high-variance models like Decision Trees.

	---

	### How It Works:

	1. Create multiple subsets of the original training dataset using bootstrapping (random sampling with replacement).
	2. Train a separate model on each subset.
	3. Aggregate the predictions:
	- For classification: majority vote.
	- For regression: average.

	---

	### Key Points:

	- Models are trained independently and in parallel.
	- Often used with Decision Trees.
	- Final prediction is more robust than any individual model.

	---

	### Example:
	A well-known example of Bagging is the Random Forest algorithm:
	- Uses multiple decision trees trained on bootstrapped samples.
	- Adds feature randomness for further diversity.

	""")


	st.title("What is Random Forest?")

	st.markdown("""
	Random Forest is a popular ensemble learning algorithm that combines the power of multiple decision trees to make more accurate and robust predictions.

	It is based on the Bagging technique and introduces randomness at two levels:
	- Random sampling of data (bootstrap samples).
	- Random subset of features for splitting at each node.

	---

	### How It Works:

	1. Bootstrap sampling: Random subsets of the training data are created with replacement.
	2. Train multiple Decision Trees on different subsets.
	3. Each tree makes a prediction.
	4. The final output is:
	- Majority vote (for classification).
	- Average prediction (for regression).

	---

	### Key Benefits:

	- Handles high-dimensional data well.
	- Reduces overfitting (more than a single Decision Tree).
	- Works for both classification and regression tasks.
	- Feature importance is easy to extract.

	---

	### Real-Life Analogy:
	Imagine asking a group of experts instead of one person – each tree gives their opinion, and the forest makes the final decision based on consensus!

	""")


	st.subheader(":blue[Random Forest: Bagging Ensemble]")

	st.markdown("""
	Random Forest is a powerful ensemble algorithm that uses the Bagging (Bootstrap Aggregating) technique with an added twist:

	---

	### Bagging Recap:
	- Bagging creates multiple models (like decision trees) trained on random subsets of the data (with replacement).
	- Final prediction is made by aggregating outputs from all models:
	- Majority vote (Classification)
	- Average (Regression)

	---

	### What Makes Random Forest Special?

	✅ Uses Bagging to build multiple Decision Trees
	✅ Adds randomness in feature selection at each split in a tree
	✅ Helps make each tree less correlated → more powerful ensemble

	---

	### How Random Forest Works:
	1. Create many bootstrap samples from the training data.
	2. Train a Decision Tree on each sample.
	3. At each split in the tree, only consider a random subset of features.
	4. Combine all trees:
	- For classification → Majority voting
	- For regression → Averaging

	---

	### Why Random Forest Works Well:
	- Handles high-dimensional data.
	- Reduces variance and overfitting.
	- More stable than individual decision trees.

	""")


	st.subheader(":blue[Bagging Algorithm in Random Forest]")

	st.markdown("""
	### 🧺 What is Bagging?

	Bagging (Bootstrap Aggregating) is an ensemble technique that:

	- Trains multiple models on random samples of the data (with replacement).
	- Aggregates the predictions to make the final decision.
	- Classification → Majority vote
	- Regression → Average

	---

	### How Random Forest Uses Bagging:

	Random Forest = Bagging + Random Feature Selection

	#### Here's what happens:
	1. It builds many decision trees using bootstrapped datasets (Bagging).
	2. When splitting a node, it uses a random subset of features.
	3. It aggregates the predictions of all trees.

	This makes Random Forest more diverse, less correlated, and more accurate than basic bagging with full-feature trees.

	---

	### Why Bagging Helps Random Forest:
	- Reduces overfitting by combining diverse learners.
	- Lowers variance of predictions.
	- Makes the model robust and stable.

	""")


	st.subheader(":blue[Bagging Ensemble for Classification & Regression]")

	st.markdown("""
	### What is Bagging?

	Bagging (Bootstrap Aggregating) is an ensemble method that trains multiple base models on randomly drawn subsets (with replacement) of the training data, and then combines their predictions.

	---

	### For Classification:

	- Uses a voting mechanism:
	- Each model votes for a class.
	- The final prediction is the majority class.

	#### Advantages:
	- Reduces overfitting
	- Decreases variance
	- Works well with unstable learners like Decision Trees

	---

	### For Regression:

	- Uses averaging:
	- Each model makes a numerical prediction.
	- The final output is the average of all predictions.

	#### Benefits:
	- Produces smoother predictions
	- Helps with noisy datasets
	- Improves model generalization

	---

	### Common Base Estimator:
	- `DecisionTreeClassifier` for classification
	- `DecisionTreeRegressor` for regression

	Scikit-learn’s `BaggingClassifier` and `BaggingRegressor` are often used.

	""")