vit-base-mini-food-3

This model is a fine-tuned version of google/vit-base-patch16-224-in21k on the Mini Food-3 custom dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5656
  • Accuracy: 0.9067

Model description

This is a Vision Transformer (ViT) model fine-tuned for food image classification. The model was trained to classify three food categories: pizza, sushi, and ice cream.

Intended uses & limitations

This model is intended for classifying images of three food types:

  • pizza
  • sushi
  • ice_cream

It performs best on images similar to the training dataset (Food-101 subset). The model may not generalize well to other food categories or different image conditions.

Training and evaluation data

The model was trained on the Mini Food-3 dataset, derived from the Food-101 dataset:

Split Images per class Total images
Train 100 300
Validation 25 75
Test 25 75
Total 150 450

Preprocessing

  • Images converted to RGB
  • Resizing and normalization using ViT image processor
  • Automatic label encoding from folder structure

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.00002
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: adamw_torch with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 3
  • weight_decay: 0.01

Training results

Training Loss Epoch Step Validation Loss Accuracy
0.9543 1.0 38 0.8170 0.8667
0.6416 2.0 76 0.6260 0.9067
0.4879 3.0 114 0.5656 0.9067

Framework versions

  • Transformers 4.50.0
  • PyTorch 2.0+
  • Datasets 2.0+
  • Tokenizers 0.13+

Model Comparison

This fine-tuned ViT model was compared against:

  • CLIP (Zero-Shot): openai/clip-vit-large-patch14 - No training required
  • OpenAI Vision Model: LLM-based image classification

The fine-tuned ViT model outperforms both baseline approaches on this specific dataset due to task-specific training.

Downloads last month
1
Safetensors
Model size
85.8M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for zutaars1/vit-computer-vision-classification-model

Finetuned
(2543)
this model