Iris Classification Models

This repository starts with a Decision Tree model trained on the classic Iris dataset. The model classifies iris flowers into three species—setosa, versicolor, or virginica—based on four numeric features (sepal length, sepal width, petal length, and petal width).

Because of its small size and simplicity, this model is intended primarily for demonstration and educational purposes.

Model Description

Framework: Scikit-Learn
Algorithm: Decision Tree (DecisionTreeClassifier class)
Hyperparameters:
- Defaults for Decision Tree in Scikit-Learn

Intended Uses

Education/Proof-of-Concept: Demonstrates loading a scikit-learn model from the Hugging Face Hub.
Beginner ML Tutorials: Introduction to classification tasks, usage of Hugging Face model hosting, and deploying simple demos in Spaces.

Limitations

Dataset Size: The Iris dataset is small (150 samples). Performance metrics may not extrapolate to real-world scenarios.
Domain Constraints: The dataset only covers three iris species and may not generalize to other types of flowers.
Not Production-Ready: This model is not suited for critical applications (e.g., healthcare, autonomous vehicles).

How to Use

To use this model, you can load the .joblib file from the Hub in Python code:

import joblib
from huggingface_hub import hf_hub_download

# Accompanying dataset is hosted in Hugging Face under 'brjapon/iris'
model_path = hf_hub_download(repo_id="brjapon/iris",
                             filename="iris_dt.joblib",
                             repo_type="model")

model = joblib.load(model_path)

# Example prediction (random values below)
sample_input = [[5.1, 3.5, 1.4, 0.2]]
prediction = model.predict(sample_input)
print(prediction)  # e.g., [0] which might correspond to 'setosa'

Training Procedure

Training Data: 80% of the 150-sample Iris dataset (120 samples).
Validation Data: 20% (30 samples).
Steps:
1. Loaded dataset (obtained from HF repository brjapon/iris)
2. Split into training and test sets with train_test_split
3. Trained Decision Tree model with default settings
4. Evaluated accuracy on the test set

Performance

Using a random 80/20 split, the model typically achieves ~97% accuracy on the test subset. Actual results may vary depending on your specific train/test split random state.

Limitations & Bias

The Iris dataset is not representative of modern, large-scale classification tasks.
Results should not be generalized beyond the included species and scenario.

brjapon
/

iris-dt