Model Card for mnistvit
A vision transformer (ViT) trained on MNIST with a PyTorch-only implementation, achieving 99.65% test set accuracy.
Model Details
Model Description
The model is a vision transformer, as described in the original Dosovitskiy et al., ICLR 2021 paper.
- Developed by: Arno Onken
- Model type: Vision Transformer
- License: MIT
Model Sources
- Python Package Index: https://pypi.org/project/mnistvit/
- Paper: Dosovitskiy et al., ICLR 2021
Uses
The model is intended to be used for learning about vision transformers. It is small and trained on MNIST as a simple and well understood dataset. Together with the mnistvit package code, the importance of various hyperparameters can be explored.
How to Get Started with the Model
Install the mnistvit package, which provides code for training and running the model:
pip install mnistvit
Place the config.json
and model.pt
file from this repository in a directory of your
choice and run Python from that directory.
To evaluate the test set accuracy and loss of the model stored in model.pt
with
configuration config.json
:
python -m mnistvit --use-accuracy --use-loss
Individual images can be classified as well. To predict the class of a digit image
stored in a file sample.jpg
:
python -m mnistvit --image-file sample.jpg
Training Details
Training Data
This model was trained on the 60,000 training set images of the
MNIST dataset. Data augmentation was
used in the form of random rotations, translations and scaling as detailed in the
mnistvit.preprocess
module.
Training Procedure
- Training regime: fp32
Hyperparameters were obtained from an 80:20 training set - validation set split of the
original MNIST training set, running Ray Tune with Optuna as detailed in the
mnistvit.tune
module. The resulting parameters were then set as default parameters in
the mnistvit.train
module.
Evaluation
Testing Data
This model was evaluated on the 10,000 test set images of the MNIST dataset.
Results
Test set accuracy: 99.65%
Test set cross entropy loss: 0.011
- Downloads last month
- 0