Update pages/3Ensemble_Techniques.py
Browse files- pages/3Ensemble_Techniques.py +159 -0
pages/3Ensemble_Techniques.py
CHANGED
@@ -0,0 +1,159 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import streamlit as st
|
2 |
+
|
3 |
+
# Page configuration
|
4 |
+
st.set_page_config(page_title="Ensemble Techniques", page_icon="🤖", layout="wide")
|
5 |
+
|
6 |
+
# Custom styling
|
7 |
+
st.markdown("""
|
8 |
+
<style>
|
9 |
+
.stApp {
|
10 |
+
background-color: #f2f6fa;
|
11 |
+
}
|
12 |
+
h1, h2, h3 {
|
13 |
+
color: #1a237e;
|
14 |
+
}
|
15 |
+
.custom-font, p, li {
|
16 |
+
font-family: 'Arial', sans-serif;
|
17 |
+
font-size: 18px;
|
18 |
+
color: #212121;
|
19 |
+
line-height: 1.6;
|
20 |
+
}
|
21 |
+
</style>
|
22 |
+
""", unsafe_allow_html=True)
|
23 |
+
|
24 |
+
# Title
|
25 |
+
st.markdown("<h1>Ensemble Learning Techniques</h1>", unsafe_allow_html=True)
|
26 |
+
|
27 |
+
# Introduction
|
28 |
+
st.markdown("""
|
29 |
+
Ensemble learning is a strategy in machine learning where **multiple models**—called base models—are combined to produce a more accurate and robust **ensemble model**. The core idea is that a group of diverse models often performs better than any individual model alone.
|
30 |
+
""", unsafe_allow_html=True)
|
31 |
+
|
32 |
+
st.markdown("**Assumption:** The base models should be **diverse**. If they are too similar, the overall ensemble may lose its advantage and yield poor results.")
|
33 |
+
|
34 |
+
# Types of Ensemble
|
35 |
+
st.markdown("<h2>Types of Ensemble Techniques</h2>", unsafe_allow_html=True)
|
36 |
+
st.write("Ensemble techniques vary based on how base models are built and how their outputs are combined.")
|
37 |
+
st.image("diff_ensemble_tecniques.png", width=900)
|
38 |
+
|
39 |
+
# Voting Ensemble
|
40 |
+
st.markdown("<h2>1. Voting Ensemble</h2>", unsafe_allow_html=True)
|
41 |
+
st.write("Voting is a straightforward ensemble approach suitable for both classification and regression. It aggregates the predictions from multiple models to make the final prediction.")
|
42 |
+
|
43 |
+
st.write("**Types:**")
|
44 |
+
st.write("- **Hard Voting**: Final output is the most frequent class label among base models.")
|
45 |
+
st.write("- **Soft Voting**: Uses the average of class probabilities to decide the output.")
|
46 |
+
|
47 |
+
st.markdown("**Steps for Classification:**")
|
48 |
+
st.markdown("""
|
49 |
+
1. Select different base models.
|
50 |
+
2. Train each on the same dataset.
|
51 |
+
3. Gather predictions.
|
52 |
+
4. Use hard or soft voting to finalize.
|
53 |
+
""")
|
54 |
+
|
55 |
+
st.image("voting.jpg", width=900)
|
56 |
+
|
57 |
+
st.markdown("**Steps for Regression:**")
|
58 |
+
st.markdown("""
|
59 |
+
1. Train various regression models.
|
60 |
+
2. Get predictions from all models.
|
61 |
+
3. Calculate the average or median of predictions.
|
62 |
+
""")
|
63 |
+
|
64 |
+
st.markdown("**Important Parameters:**")
|
65 |
+
st.markdown("- `voting`: Choose between 'hard' or 'soft' voting\n- `weights`: Assign relative importance to models")
|
66 |
+
|
67 |
+
# Voting implementation link
|
68 |
+
st.markdown("<h2>Voting Implementation Example</h2>", unsafe_allow_html=True)
|
69 |
+
st.markdown(
|
70 |
+
"<a href='https://colab.research.google.com/drive/1LPZR9RnvEXP8mzOLOBfSVVyHHZ7GFns4?usp=sharing' target='_blank' style='font-size: 16px; color: #1a237e;'>Open Jupyter Notebook</a>",
|
71 |
+
unsafe_allow_html=True
|
72 |
+
)
|
73 |
+
|
74 |
+
# Bagging
|
75 |
+
st.markdown("<h2>2. Bagging (Bootstrap Aggregating)</h2>", unsafe_allow_html=True)
|
76 |
+
st.write("Bagging boosts model performance by training the same algorithm on different random subsets (with replacement) of the dataset.")
|
77 |
+
|
78 |
+
st.write("Unlike voting, bagging keeps the algorithm fixed and varies the training data to create diverse models.")
|
79 |
+
|
80 |
+
st.write("**Variants:**")
|
81 |
+
st.write("- **Bagging**: General form, any model can be used.")
|
82 |
+
st.write("- **Random Forest**: Special form using decision trees with added randomness.")
|
83 |
+
|
84 |
+
st.image("bagging.jpg", width=900)
|
85 |
+
|
86 |
+
st.markdown("**Steps for Classification:**")
|
87 |
+
st.markdown("""
|
88 |
+
1. Generate bootstrapped samples.
|
89 |
+
2. Train models on each sample.
|
90 |
+
3. Aggregate outputs using majority vote.
|
91 |
+
""")
|
92 |
+
|
93 |
+
st.markdown("**Steps for Regression:**")
|
94 |
+
st.markdown("""
|
95 |
+
1. Create random samples from the dataset.
|
96 |
+
2. Train models on each.
|
97 |
+
3. Average the predictions.
|
98 |
+
""")
|
99 |
+
|
100 |
+
st.markdown("<h2>How to Create Bootstrapped Samples</h2>", unsafe_allow_html=True)
|
101 |
+
st.write("**Row and Column Sampling** help increase model diversity in bagging.")
|
102 |
+
|
103 |
+
st.write("**Row Sampling:**")
|
104 |
+
st.write("- With Replacement: Duplicates allowed (classic bootstrapping)")
|
105 |
+
st.write("- Without Replacement: Unique rows only (pasting)")
|
106 |
+
|
107 |
+
st.write("**Column Sampling:**")
|
108 |
+
st.write("- With Replacement: Some features may repeat.")
|
109 |
+
st.write("- Without Replacement: Each feature is used only once per model.")
|
110 |
+
|
111 |
+
st.markdown("**Important Parameters:**")
|
112 |
+
st.markdown("- `n_estimators`: Number of models to train\n- `max_samples`: % of data per model\n- `bootstrap`: Whether sampling is with replacement")
|
113 |
+
|
114 |
+
# Bagging implementation link
|
115 |
+
st.markdown("<h2>Bagging Implementation Example</h2>", unsafe_allow_html=True)
|
116 |
+
st.markdown(
|
117 |
+
"<a href='https://colab.research.google.com/drive/1cumZl7H9fqyORfaw236WWxQViJxvSKHV?usp=sharing' target='_blank' style='font-size: 16px; color: #1a237e;'>Open Jupyter Notebook</a>",
|
118 |
+
unsafe_allow_html=True
|
119 |
+
)
|
120 |
+
|
121 |
+
# Random Forest
|
122 |
+
st.markdown("<h2>3. Random Forest</h2>", unsafe_allow_html=True)
|
123 |
+
st.write("Random Forest is a popular ensemble method that builds multiple decision trees using bootstrapped samples. It adds another layer of randomness by selecting a subset of features at each split.")
|
124 |
+
|
125 |
+
st.image("randomforest.jpg", width=900)
|
126 |
+
|
127 |
+
st.markdown("**Steps for Classification:**")
|
128 |
+
st.markdown("""
|
129 |
+
1. Create bootstrapped samples.
|
130 |
+
2. Train decision trees using random feature selection at each split.
|
131 |
+
3. Combine predictions using majority vote.
|
132 |
+
""")
|
133 |
+
|
134 |
+
st.markdown("**Steps for Regression:**")
|
135 |
+
st.markdown("""
|
136 |
+
1. Prepare bootstrapped training sets.
|
137 |
+
2. Train decision tree regressors with random feature splits.
|
138 |
+
3. Predict by averaging model outputs.
|
139 |
+
""")
|
140 |
+
|
141 |
+
st.markdown("**Bagging vs Random Forest:**")
|
142 |
+
st.markdown("""
|
143 |
+
- **Bagging:** Any algorithm, row/column sampling optional
|
144 |
+
- **Random Forest:** Uses decision trees only, always samples rows & features
|
145 |
+
- **Bagging:** No internal randomness
|
146 |
+
- **Random Forest:** Adds randomness via feature selection
|
147 |
+
""")
|
148 |
+
|
149 |
+
# Random Forest implementation link
|
150 |
+
st.markdown("<h2>Random Forest Implementation Example</h2>", unsafe_allow_html=True)
|
151 |
+
st.markdown(
|
152 |
+
"<a href='https://colab.research.google.com/drive/1S6YyfTx9N35E5fpPF0z6ZDm85BSp1deT?usp=sharing' target='_blank' style='font-size: 16px; color: #1a237e;'>Open Jupyter Notebook</a>",
|
153 |
+
unsafe_allow_html=True
|
154 |
+
)
|
155 |
+
|
156 |
+
# Conclusion
|
157 |
+
st.markdown("""
|
158 |
+
Ensemble learning is a powerful approach that enhances model accuracy, reduces overfitting, and improves robustness. Choosing between techniques like **Voting**, **Bagging**, and **Random Forest** depends on your use case and the nature of the data.
|
159 |
+
""", unsafe_allow_html=True)
|