Spaces:

sklearn-docs
/

GradientBoostingClassifier

Runtime error

App Files Files Community

ZennyKenny commited on Jan 31

Commit

e72fe9d

verified ·

1 Parent(s): 3992b65

improve docs

Browse files

Files changed (1) hide show

app.py +19 -10

app.py CHANGED Viewed

@@ -152,29 +152,38 @@ with gr.Blocks() as demo:
     # High-level title and description
     gr.Markdown(
         """
-        # Interactive Gradient Boosting Demo
         This Space demonstrates how to train a [GradientBoostingClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html#gradientboostingclassifier) from **scikit-learn** on **tabular datasets** hosted on the [Hugging Face Hub](https://huggingface.co/datasets).
         **Purpose**:
-        - Easy explore hyperparameters (_learning_rate, n_estimators, max_depth_) and quickly train an ML model on real data.
         - Visualise model performance via confusion matrix heatmap and a feature importance plot.
-        **How to Use**:
         1. Select one of the suggested datasets from the dropdown _or_ enter any valid dataset from the [Hugging Face Hub](https://huggingface.co/datasets).
         2. Click **Load Columns** to retrieve the column names from the dataset's **train** split.
         3. Choose exactly _one_ **Label column** (the target) and one or more **Feature columns** (the inputs).
         4. Adjust hyperparameters (learning_rate, n_estimators, max_depth, test_size).
         5. Click **Train & Evaluate** to train a Gradient Boosting model and see its accuracy, feature importances, and confusion matrix.
-        ---
-        **Please Note**:
-        - The dataset must have a **"train"** split with tabular columns (i.e., no nested structures).
-        - Large datasets may take time to download/train.
-        - The confusion matrix helps you see how predictions compare to ground-truth labels. The diagonal cells show correct predictions; off-diagonal cells indicate misclassifications.
-        - The feature importance plot shows which features the model relies on the most for its predictions.
         You are now a machine learning engineer, congratulations 🤗
         """
     )

     # High-level title and description
     gr.Markdown(
         """
+        # Introduction to Gradient Boosting
         This Space demonstrates how to train a [GradientBoostingClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html#gradientboostingclassifier) from **scikit-learn** on **tabular datasets** hosted on the [Hugging Face Hub](https://huggingface.co/datasets).
+        Gradient Boosting is an ensemble machine learning technique that combines many weak learners (usually small decision trees) in an iterative, stage-wise fashion to create a stronger overall model.
+        In each step, the algorithm fits a new weak learner to the current errors of the combined ensemble, effectively allowing the model to focus on the hardest-to-predict data points.
+        By repeatedly adding these specialized trees, Gradient Boosting can capture complex patterns and deliver high predictive accuracy, especially on tabular data.
+        **Put simply, Gradient Boosting makes a big deal out of small anomolies!**
         **Purpose**:
+        - Easily explore hyperparameters (_learning_rate, n_estimators, max_depth_) and quickly train an ML model on real data.
         - Visualise model performance via confusion matrix heatmap and a feature importance plot.
+        **Notes**:
+        - The dataset must have a **"train"** split with tabular columns (i.e., no nested structures).
+        - Large datasets may take time to download/train.
+        - The confusion matrix helps you see how predictions compare to ground-truth labels. The diagonal cells show correct predictions; off-diagonal cells indicate misclassifications.
+        - The feature importance plot shows which features the model relies on the most for its predictions.
+        ---
+        **Usage**:
         1. Select one of the suggested datasets from the dropdown _or_ enter any valid dataset from the [Hugging Face Hub](https://huggingface.co/datasets).
         2. Click **Load Columns** to retrieve the column names from the dataset's **train** split.
         3. Choose exactly _one_ **Label column** (the target) and one or more **Feature columns** (the inputs).
         4. Adjust hyperparameters (learning_rate, n_estimators, max_depth, test_size).
         5. Click **Train & Evaluate** to train a Gradient Boosting model and see its accuracy, feature importances, and confusion matrix.
         You are now a machine learning engineer, congratulations 🤗
+        ---
         """
     )