|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- scikit-learn/iris |
|
metrics: |
|
- accuracy |
|
library_name: pytorch |
|
pipeline_tag: tabular-classification |
|
--- |
|
|
|
# logistic-regression-iris |
|
|
|
A logistic regression model trained on the Iris dataset. |
|
|
|
It takes two inputs: `'PetalLengthCm'` and `'PetalWidthCm'`. It predicts whether the species is `'Iris-setosa'`. |
|
|
|
It is a PyTorch adaptation of the scikit-learn model in Chapter 10 of Aurelien Geron's book 'Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow'. |
|
|
|
Code: https://github.com/sambitmukherjee/handson-ml3-pytorch/blob/main/chapter10/logistic_regression_iris.ipynb |
|
|
|
Experiment tracking: https://wandb.ai/sadhaklal/logistic-regression-iris |
|
|
|
## Usage |
|
|
|
``` |
|
!pip install -q datasets |
|
|
|
from datasets import load_dataset |
|
|
|
iris = load_dataset("scikit-learn/iris") |
|
iris.set_format("pandas") |
|
iris_df = iris['train'][:] |
|
X = iris_df[['PetalLengthCm', 'PetalWidthCm']] |
|
y = (iris_df['Species'] == "Iris-setosa").astype(int) |
|
|
|
class_names = ["Not Iris-setosa", "Iris-setosa"] |
|
|
|
from sklearn.model_selection import train_test_split |
|
|
|
X_train, X_val, y_train, y_val = train_test_split(X.values, y.values, test_size=0.3, stratify=y, random_state=42) |
|
X_means, X_stds = X_train.mean(axis=0), X_train.std(axis=0) |
|
|
|
import torch |
|
import torch.nn as nn |
|
from huggingface_hub import PyTorchModelHubMixin |
|
|
|
device = torch.device("cpu") |
|
|
|
class LinearModel(nn.Module, PyTorchModelHubMixin): |
|
def __init__(self): |
|
super().__init__() |
|
self.fc = nn.Linear(2, 1) |
|
|
|
def forward(self, x): |
|
out = self.fc(x) |
|
return out |
|
|
|
model = LinearModel.from_pretrained("sadhaklal/logistic-regression-iris") |
|
model.to(device) |
|
|
|
# Inference on new data: |
|
import numpy as np |
|
|
|
X_new = np.array([[2.0, 0.5], [3.0, 1.0]]) # Contains data on 2 new flowers. |
|
X_new = ((X_new - X_means) / X_stds) # Normalize. |
|
X_new = torch.from_numpy(X_new).float() |
|
|
|
model.eval() |
|
X_new = X_new.to(device) |
|
with torch.no_grad(): |
|
logits = model(X_new) |
|
proba = torch.sigmoid(logits.squeeze()) |
|
preds = (proba > 0.5).long() |
|
|
|
print(f"Predicted classes: {preds}") |
|
print(f"Predicted probabilities of being Iris-setosa: {proba}") |
|
``` |
|
|
|
## Metric |
|
|
|
As shown above, the validation set contains 30% of the examples (selected at random in a stratified fashion). |
|
|
|
Accuracy on the validation set: 1.0 |
|
|