metadata
license: apache-2.0
General Information:
Used Dataset: cats_vs_dogs (https://huggingface.co/datasets/cats_vs_dogs)
Used Label: Randomly Images are flipped and labels for the flipped images are 1, otherwise 0.
Used Library: Pytorch
Used Model: ResNet18 from torchvision
Number of classes: 2 (0 means No flip and 1 means Flipped image)
Train Test Split: 70-30
Some sample Images and Labels from created dataset
Specific information about the Dataset:
The following files from the dataset were removed during the training because those files are broken/ corrupted.
- ./kagglecatsanddogs_3367a/PetImages/Cat/666.jpg
- ./kagglecatsanddogs_3367a/PetImages/Cat/10404.jpg
- ./kagglecatsanddogs_3367a/PetImages/Dog/11702.jpg
Training Information:
Total Epoch: 5
Pretrained: True (ImageNet weight) (Every layer is trainable)
Image Size: 224 x 224
Batch Size: 128
Optimizer: SGD
Learning Rate: 0.001 (Constant throughout the training)
Momentum: 0.9
Loss: CrossEntropy Loss
Result:
Accuracy: 98.4266
F1: 98.4271
Recall: 98.4261
Precision: 98.4265
Confusion Matrix:
Some Misclassified Images (Randomly Selected):
Some possible improvements:
- Most of the misclassified images are occluded by some other objects or partly visible. One possible improvement could be to improve this type of image in the training dataset.
- Hyperparameter tuning is another option, we could try to see whether the performance improves or not. For example, instead of using a constant learning rate, we could try a cyclical learning rate. This type of learning rate helps the model overcome local minima.
- If we consider the rightmost image in the above figure, we see that the pose of the cat is different than most of the images in the training set. I think Augmentation like CutMix will be helpful in this scenario.