Cat vs Dog Classifier π±πΆ
A ResNet50 transfer-learning classifier that distinguishes cats from dogs at ~94% validation accuracy (AUC 0.98), trained in two stages on the Oxford-IIIT Pet dataset.
Full training code, Grad-CAM inference, and a complete beginner's guide: π https://github.com/mtkl6/cat-dog-classifier
β οΈ The inference widget is disabled because this is a custom head on a torchvision backbone (not a
transformersmodel) β load it with the snippet below.
Files
| File | What |
|---|---|
cat_dog_classifier.pt |
trained weights (raw state_dict, ~90 MB) |
config.json |
architecture & preprocessing metadata |
Usage
import torch, torch.nn as nn
from torchvision import models, transforms
from huggingface_hub import hf_hub_download
from PIL import Image
model = models.resnet50()
model.fc = nn.Sequential(nn.Dropout(0.4), nn.Linear(2048, 1))
weights = hf_hub_download("mtkl6/cat-dog-classifier", "cat_dog_classifier.pt")
model.load_state_dict(torch.load(weights, weights_only=True))
model.eval()
tf = transforms.Compose([
transforms.Resize((224, 224)), transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
])
x = tf(Image.open("pet.jpg").convert("RGB")).unsqueeze(0)
p_dog = torch.sigmoid(model(x)).item()
print("dog" if p_dog > 0.5 else "cat", f"({max(p_dog, 1 - p_dog):.1%})")
Labels: 0 = cat, 1 = dog. The model outputs a single logit; apply sigmoid
and threshold at 0.5.
Training
| Backbone | ResNet50 (IMAGENET1K_V1), head Dropout(0.4) β Linear(2048, 1) |
| Stage 1 | frozen backbone, head only β lr 1e-3, 10 epochs β 86.3% val |
| Stage 2 | fine-tune layer4 β lr 1e-5, 10 epochs β 94.2% val, AUC 0.98 |
| Loss / optim | BCEWithLogitsLoss, Adam, ReduceLROnPlateau |
| Input | 224Γ224 RGB, ImageNet normalization |
| Dataset | Oxford-IIIT Pet (37 breeds β binary) |
Citation
@software{cat_dog_classifier_2026,
author = {Moritz (mtkl6)},
title = {Cat vs Dog Classifier: a ResNet50 transfer-learning tutorial},
year = {2026},
url = {https://github.com/mtkl6/cat-dog-classifier}
}
License
Code & weights: MIT. Dataset: Oxford-IIIT Pet (Parkhi et al., 2012), used under its own research/educational terms.
- Downloads last month
- 13