Update README.md
Browse files
@@ -18,6 +18,30 @@ Vision Transformer (ViT) model pre-trained on ImageNet-21k (14 million images, 2
18 |
19 |
Finally the ViT was finetuned on the Chaoyang dataset at resolution 384x384, using a fixed 10% of the training set as the validation set and evaluated on the official test set using the best validation model based on the loss
20 |
21 |
# Results
22 |
23 |
Our model represents the current state-of-the-art in the field, as it outperforms previous state-of-the-art models proposed in papers with code,
18 |
19 |
Finally the ViT was finetuned on the Chaoyang dataset at resolution 384x384, using a fixed 10% of the training set as the validation set and evaluated on the official test set using the best validation model based on the loss
20 |
21 |
# Augmentation pipeline
22 |
To address the issue of class imbalance in our training set, we performed oversampling with repetition.
23 |
Specifically, we duplicated the minority classes images until we obtained an even distribution across all classes.
24 |
This resulted in a larger training set, but ensured that our model was exposed to an equal number of samples from each class during training.
25 |
We verified that this approach did not lead to overfitting or other issues by using a validation set with the original class distribution.
26 |
We used the following augmentation pipeline for our experiments:
27 |
28 |
A.Resize(img_size, img_size),
29 |
30 |
31 |
32 |
A.RandomResizedCrop(img_size, img_size, scale=(0.5, 1.0), p=0.5),
33 |
34 |
35 |
This pipeline consists of the following transformations:
36 |
37 |
- Resize: resizes the image to a fixed size of (img_size, img_size).
38 |
- HorizontalFlip: flips the image horizontally with a probability of 0.5.
39 |
- VerticalFlip: flips the image vertically with a probability of 0.5.
40 |
- RandomRotate90: randomly rotates the image by 90, 180, or 270 degrees with a probability of 0.5.
41 |
- RandomResizedCrop: randomly crops and resizes the image to a size between 50% and 100% of the original size, with a probability of 0.5.
42 |
- ToTensorV2: converts the image to a PyTorch tensor.
43 |
44 |
These transformations were chosen to augment the dataset with a variety of geometric transformations, while preserving important visual features.
45 |
# Results
46 |
47 |
Our model represents the current state-of-the-art in the field, as it outperforms previous state-of-the-art models proposed in papers with code,