Spaces:
Runtime error
Runtime error
improve docs
Browse files
app.py
CHANGED
|
@@ -152,29 +152,38 @@ with gr.Blocks() as demo:
|
|
| 152 |
# High-level title and description
|
| 153 |
gr.Markdown(
|
| 154 |
"""
|
| 155 |
-
#
|
| 156 |
|
| 157 |
This Space demonstrates how to train a [GradientBoostingClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html#gradientboostingclassifier) from **scikit-learn** on **tabular datasets** hosted on the [Hugging Face Hub](https://huggingface.co/datasets).
|
| 158 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 159 |
**Purpose**:
|
| 160 |
-
-
|
| 161 |
- Visualise model performance via confusion matrix heatmap and a feature importance plot.
|
| 162 |
|
| 163 |
-
**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 164 |
1. Select one of the suggested datasets from the dropdown _or_ enter any valid dataset from the [Hugging Face Hub](https://huggingface.co/datasets).
|
| 165 |
2. Click **Load Columns** to retrieve the column names from the dataset's **train** split.
|
| 166 |
3. Choose exactly _one_ **Label column** (the target) and one or more **Feature columns** (the inputs).
|
| 167 |
4. Adjust hyperparameters (learning_rate, n_estimators, max_depth, test_size).
|
| 168 |
5. Click **Train & Evaluate** to train a Gradient Boosting model and see its accuracy, feature importances, and confusion matrix.
|
| 169 |
|
| 170 |
-
---
|
| 171 |
-
**Please Note**:
|
| 172 |
-
- The dataset must have a **"train"** split with tabular columns (i.e., no nested structures).
|
| 173 |
-
- Large datasets may take time to download/train.
|
| 174 |
-
- The confusion matrix helps you see how predictions compare to ground-truth labels. The diagonal cells show correct predictions; off-diagonal cells indicate misclassifications.
|
| 175 |
-
- The feature importance plot shows which features the model relies on the most for its predictions.
|
| 176 |
-
|
| 177 |
You are now a machine learning engineer, congratulations π€
|
|
|
|
|
|
|
| 178 |
"""
|
| 179 |
)
|
| 180 |
|
|
|
|
| 152 |
# High-level title and description
|
| 153 |
gr.Markdown(
|
| 154 |
"""
|
| 155 |
+
# Introduction to Gradient Boosting
|
| 156 |
|
| 157 |
This Space demonstrates how to train a [GradientBoostingClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html#gradientboostingclassifier) from **scikit-learn** on **tabular datasets** hosted on the [Hugging Face Hub](https://huggingface.co/datasets).
|
| 158 |
|
| 159 |
+
Gradient Boosting is an ensemble machine learning technique that combines many weak learners (usually small decision trees) in an iterative, stage-wise fashion to create a stronger overall model.
|
| 160 |
+
In each step, the algorithm fits a new weak learner to the current errors of the combined ensemble, effectively allowing the model to focus on the hardest-to-predict data points.
|
| 161 |
+
By repeatedly adding these specialized trees, Gradient Boosting can capture complex patterns and deliver high predictive accuracy, especially on tabular data.
|
| 162 |
+
|
| 163 |
+
**Put simply, Gradient Boosting makes a big deal out of small anomolies!**
|
| 164 |
+
|
| 165 |
**Purpose**:
|
| 166 |
+
- Easily explore hyperparameters (_learning_rate, n_estimators, max_depth_) and quickly train an ML model on real data.
|
| 167 |
- Visualise model performance via confusion matrix heatmap and a feature importance plot.
|
| 168 |
|
| 169 |
+
**Notes**:
|
| 170 |
+
- The dataset must have a **"train"** split with tabular columns (i.e., no nested structures).
|
| 171 |
+
- Large datasets may take time to download/train.
|
| 172 |
+
- The confusion matrix helps you see how predictions compare to ground-truth labels. The diagonal cells show correct predictions; off-diagonal cells indicate misclassifications.
|
| 173 |
+
- The feature importance plot shows which features the model relies on the most for its predictions.
|
| 174 |
+
|
| 175 |
+
---
|
| 176 |
+
|
| 177 |
+
**Usage**:
|
| 178 |
1. Select one of the suggested datasets from the dropdown _or_ enter any valid dataset from the [Hugging Face Hub](https://huggingface.co/datasets).
|
| 179 |
2. Click **Load Columns** to retrieve the column names from the dataset's **train** split.
|
| 180 |
3. Choose exactly _one_ **Label column** (the target) and one or more **Feature columns** (the inputs).
|
| 181 |
4. Adjust hyperparameters (learning_rate, n_estimators, max_depth, test_size).
|
| 182 |
5. Click **Train & Evaluate** to train a Gradient Boosting model and see its accuracy, feature importances, and confusion matrix.
|
| 183 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 184 |
You are now a machine learning engineer, congratulations π€
|
| 185 |
+
|
| 186 |
+
---
|
| 187 |
"""
|
| 188 |
)
|
| 189 |
|