shamimtowhid's picture
Added possible improvements
2b639e1
metadata
license: apache-2.0

General Information:

Used Dataset: cats_vs_dogs (https://huggingface.co/datasets/cats_vs_dogs)

Used Label: Randomly Images are flipped and labels for the flipped images are 1, otherwise 0.

Used Library: Pytorch

Used Model: ResNet18 from torchvision

Number of classes: 2 (0 means No flip and 1 means Flipped image)

Train Test Split: 70-30

Some sample Images and Labels from created dataset

Alt sample image png

Specific information about the Dataset:

The following files from the dataset were removed during the training because those files are broken/ corrupted.

  • ./kagglecatsanddogs_3367a/PetImages/Cat/666.jpg
  • ./kagglecatsanddogs_3367a/PetImages/Cat/10404.jpg
  • ./kagglecatsanddogs_3367a/PetImages/Dog/11702.jpg

Training Information:

Total Epoch: 5

Pretrained: True (ImageNet weight) (Every layer is trainable)

Image Size: 224 x 224

Batch Size: 128

Optimizer: SGD

Learning Rate: 0.001 (Constant throughout the training)

Momentum: 0.9

Loss: CrossEntropy Loss

Result:

Accuracy: 98.4266

F1: 98.4271

Recall: 98.4261

Precision: 98.4265

Confusion Matrix:

Alt confusion matrix png

Some Misclassified Images (Randomly Selected):

Alt misclassified image png

Some possible improvements:

  • Most of the misclassified images are occluded by some other objects or partly visible. One possible improvement could be to improve this type of image in the training dataset.
  • Hyperparameter tuning is another option, we could try to see whether the performance improves or not. For example, instead of using a constant learning rate, we could try a cyclical learning rate. This type of learning rate helps the model overcome local minima.
  • If we consider the rightmost image in the above figure, we see that the pose of the cat is different than most of the images in the training set. I think Augmentation like CutMix will be helpful in this scenario.