Cat vs Dog Classifier 🐱🐢

License: MIT Code on GitHub

A ResNet50 transfer-learning classifier that distinguishes cats from dogs at ~94% validation accuracy (AUC 0.98), trained in two stages on the Oxford-IIIT Pet dataset.

Full training code, Grad-CAM inference, and a complete beginner's guide: πŸ‘‰ https://github.com/mtkl6/cat-dog-classifier

⚠️ The inference widget is disabled because this is a custom head on a torchvision backbone (not a transformers model) β€” load it with the snippet below.

Files

File What
cat_dog_classifier.pt trained weights (raw state_dict, ~90 MB)
config.json architecture & preprocessing metadata

Usage

import torch, torch.nn as nn
from torchvision import models, transforms
from huggingface_hub import hf_hub_download
from PIL import Image

model = models.resnet50()
model.fc = nn.Sequential(nn.Dropout(0.4), nn.Linear(2048, 1))
weights = hf_hub_download("mtkl6/cat-dog-classifier", "cat_dog_classifier.pt")
model.load_state_dict(torch.load(weights, weights_only=True))
model.eval()

tf = transforms.Compose([
    transforms.Resize((224, 224)), transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
])
x = tf(Image.open("pet.jpg").convert("RGB")).unsqueeze(0)
p_dog = torch.sigmoid(model(x)).item()
print("dog" if p_dog > 0.5 else "cat", f"({max(p_dog, 1 - p_dog):.1%})")

Labels: 0 = cat, 1 = dog. The model outputs a single logit; apply sigmoid and threshold at 0.5.

Training

Backbone ResNet50 (IMAGENET1K_V1), head Dropout(0.4) β†’ Linear(2048, 1)
Stage 1 frozen backbone, head only β€” lr 1e-3, 10 epochs β†’ 86.3% val
Stage 2 fine-tune layer4 β€” lr 1e-5, 10 epochs β†’ 94.2% val, AUC 0.98
Loss / optim BCEWithLogitsLoss, Adam, ReduceLROnPlateau
Input 224Γ—224 RGB, ImageNet normalization
Dataset Oxford-IIIT Pet (37 breeds β†’ binary)

Citation

@software{cat_dog_classifier_2026,
  author = {Moritz (mtkl6)},
  title  = {Cat vs Dog Classifier: a ResNet50 transfer-learning tutorial},
  year   = {2026},
  url    = {https://github.com/mtkl6/cat-dog-classifier}
}

License

Code & weights: MIT. Dataset: Oxford-IIIT Pet (Parkhi et al., 2012), used under its own research/educational terms.

Downloads last month
13
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support