Edit model card

Model Card for mnistvit

A vision transformer (ViT) trained on MNIST with a PyTorch-only implementation, achieving 99.65% test set accuracy.

Model Details

Model Description

The model is a vision transformer, as described in the original Dosovitskiy et al., ICLR 2021 paper.

  • Developed by: Arno Onken
  • Model type: Vision Transformer
  • License: MIT

Model Sources

Uses

The model is intended to be used for learning about vision transformers. It is small and trained on MNIST as a simple and well understood dataset. Together with the mnistvit package code, the importance of various hyperparameters can be explored.

How to Get Started with the Model

Install the mnistvit package, which provides code for training and running the model:

pip install mnistvit

Place the config.json and model.pt file from this repository in a directory of your choice and run Python from that directory.

To evaluate the test set accuracy and loss of the model stored in model.pt with configuration config.json:

python -m mnistvit --use-accuracy --use-loss

Individual images can be classified as well. To predict the class of a digit image stored in a file sample.jpg:

python -m mnistvit --image-file sample.jpg

Training Details

Training Data

This model was trained on the 60,000 training set images of the MNIST dataset. Data augmentation was used in the form of random rotations, translations and scaling as detailed in the mnistvit.preprocess module.

Training Procedure

  • Training regime: fp32

Hyperparameters were obtained from an 80:20 training set - validation set split of the original MNIST training set, running Ray Tune with Optuna as detailed in the mnistvit.tune module. The resulting parameters were then set as default parameters in the mnistvit.train module.

Evaluation

Testing Data

This model was evaluated on the 10,000 test set images of the MNIST dataset.

Results

Test set accuracy: 99.65%

Test set cross entropy loss: 0.011

Downloads last month
0
Unable to determine this model’s pipeline type. Check the docs .

Dataset used to train asnelt/mnistvit