vit-base-mini-food-3

This model is a fine-tuned version of google/vit-base-patch16-224-in21k on the Mini Food-3 custom dataset. It achieves the following results on the evaluation set:

Loss: 0.5656
Accuracy: 0.9067

Model description

This is a Vision Transformer (ViT) model fine-tuned for food image classification. The model was trained to classify three food categories: pizza, sushi, and ice cream.

Intended uses & limitations

This model is intended for classifying images of three food types:

pizza
sushi
ice_cream

It performs best on images similar to the training dataset (Food-101 subset). The model may not generalize well to other food categories or different image conditions.

Training and evaluation data

The model was trained on the Mini Food-3 dataset, derived from the Food-101 dataset:

Split	Images per class	Total images
Train	100	300
Validation	25	75
Test	25	75
Total	150	450

Preprocessing

Images converted to RGB
Resizing and normalization using ViT image processor
Automatic label encoding from folder structure

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.00002
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: adamw_torch with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 3
weight_decay: 0.01

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
0.9543	1.0	38	0.8170	0.8667
0.6416	2.0	76	0.6260	0.9067
0.4879	3.0	114	0.5656	0.9067

Framework versions

Transformers 4.50.0
PyTorch 2.0+
Datasets 2.0+
Tokenizers 0.13+

Model Comparison

This fine-tuned ViT model was compared against:

CLIP (Zero-Shot): openai/clip-vit-large-patch14 - No training required
OpenAI Vision Model: LLM-based image classification

The fine-tuned ViT model outperforms both baseline approaches on this specific dataset due to task-specific training.

Downloads last month: 1

Safetensors

Model size

85.8M params

Tensor type

F32

Model tree for zutaars1/vit-computer-vision-classification-model

Base model

google/vit-base-patch16-224-in21k

Finetuned

(2543)

this model