Snarci
/

ViT-base-patch16-384-Chaoyang-finetuned

Image Classification

Inference Endpoints

Model card Files Files and versions Community

Snarci commited on Apr 21, 2023

Commit

f0d65e5

•

1 Parent(s): 5e22626

Update README.md

Files changed (1) hide show

README.md +24 -0

README.md CHANGED Viewed

@@ -18,6 +18,30 @@ Vision Transformer (ViT) model pre-trained on ImageNet-21k (14 million images, 2
 Finally the ViT was finetuned on the Chaoyang dataset at resolution 384x384, using a fixed 10% of the training set as the validation set and evaluated on the official test set using the best validation model based on the loss
 # Results
 Our model represents the current state-of-the-art in the field, as it outperforms previous state-of-the-art models proposed in papers with code,

 Finally the ViT was finetuned on the Chaoyang dataset at resolution 384x384, using a fixed 10% of the training set as the validation set and evaluated on the official test set using the best validation model based on the loss
+# Augmentation pipeline
+To address the issue of class imbalance in our training set, we performed oversampling with repetition.
+Specifically, we duplicated the minority classes images until we obtained an even distribution across all classes.
+This resulted in a larger training set, but ensured that our model was exposed to an equal number of samples from each class during training.
+We verified that this approach did not lead to overfitting or other issues by using a validation set with the original class distribution.
+We used the following augmentation pipeline for our experiments:
+A.Resize(img_size, img_size),
+A.HorizontalFlip(p=0.5),
+A.VerticalFlip(p=0.5),
+A.RandomRotate90(p=0.5),
+A.RandomResizedCrop(img_size, img_size, scale=(0.5, 1.0), p=0.5),
+ToTensorV2(p=1.0)
+This pipeline consists of the following transformations:
+- Resize: resizes the image to a fixed size of (img_size, img_size).
+- HorizontalFlip: flips the image horizontally with a probability of 0.5.
+- VerticalFlip: flips the image vertically with a probability of 0.5.
+- RandomRotate90: randomly rotates the image by 90, 180, or 270 degrees with a probability of 0.5.
+- RandomResizedCrop: randomly crops and resizes the image to a size between 50% and 100% of the original size, with a probability of 0.5.
+- ToTensorV2: converts the image to a PyTorch tensor.
+These transformations were chosen to augment the dataset with a variety of geometric transformations, while preserving important visual features.
 # Results
 Our model represents the current state-of-the-art in the field, as it outperforms previous state-of-the-art models proposed in papers with code,