Swin-V2-base-Food

This model is a fine-tuned version of microsoft/swinv2-base-patch4-window8-256 on the ItsNotRohit/Food121-224 dataset. It achieves the following results on the evaluation set:

Loss: 0.7099
Accuracy: 0.8160
Recall: 0.8160
Precision: 0.8168
F1: 0.8159

Model description

Swin v2 is a powerful vision model based on Transformers, achieving top-notch accuracy in image classification tasks. It excels thanks to:

Hierarchical architecture: Efficiently captures features at different scales, like CNNs.
Shifted windows: Improves information flow and reduces computational cost.
Large model capacity: Enables accurate and generalizable predictions.

Swin v2 sets new records on ImageNet, even needing 40x less data and training time than similar models. It's also versatile, tackling various vision tasks and handling large images.

The model was fine tuned on a 120 categories of food images.

To use the model use the following code snippet:

from transformers import pipeline
from PIL import Image

# init image classification pipeline
classifier = pipeline("image-classification", "arnabdhar/Swin-V2-base-Food")

# use pipeline for inference
image = Image.open(image_path)
results = classifier(image)

Intended uses

The model can be used for the following tasks:

Food Image Classification: Use this model to classify food images using the Transformers pipeline module.
Base Model for Fine Tuning: If you want to use this model for your own custom dataset you can surely do so by treating this model as a base model and fine tune it for your own dataset.

Training procedure

The fine tuning was done on Google Colab with a NVIDIA T4 GPU with 15GB of VRAM, the model was trained for 20,000 steps and it took ~5.5 hours for the fine tuning to complete which also included periodic evaluation of the model.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 16
eval_batch_size: 128
seed: 17769929
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.01
training_steps: 20000

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy	Recall	Precision	F1
1.5169	0.33	2000	1.2680	0.6746	0.6746	0.7019	0.6737
1.2362	0.66	4000	1.0759	0.7169	0.7169	0.7411	0.7178
1.1076	0.99	6000	0.9757	0.7437	0.7437	0.7593	0.7430
0.9163	1.32	8000	0.9123	0.7623	0.7623	0.7737	0.7628
0.8291	1.65	10000	0.8397	0.7807	0.7807	0.7874	0.7796
0.7949	1.98	12000	0.7724	0.7965	0.7965	0.8014	0.7965
0.6455	2.31	14000	0.7458	0.8030	0.8030	0.8069	0.8031
0.6332	2.64	16000	0.7222	0.8110	0.8110	0.8122	0.8106
0.6132	2.98	18000	0.7021	0.8154	0.8154	0.8170	0.8155
0.57	3.31	20000	0.7099	0.8160	0.8160	0.8168	0.8159

Framework versions

Transformers 4.35.2
Pytorch 2.1.0+cu121
Datasets 2.15.0
Tokenizers 0.15.0

arnabdhar
/

Swin-V2-base-Food