Edit model card

Tensorflow Keras implementation of : Image classification with ConvMixer

The full credit goes to: Sayak Paul

Short description:

ConvMixer is a simple model based on the ideas of representing an image as patches( used in ViT) and separating the mixing of Spatial and channel dimensions (used in MLP-Mixer). Unlike ViT and MLP-Mixer, they use only standard Convolution operations. The full paper is a submission to ICLR 22 and can be found here

Model and Dataset used

The Dataset used here is CIFAR-10. The model is called ConvMixer-256/8 where 256 is the hidden dimension (the dimension of patches) and 8 is the depth(number of repetitions of ConvMix layers)

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

Hyperparameters Value
name AdamW
learning_rate 0.0010000000474974513
decay 0.0
beta_1 0.8999999761581421
beta_2 0.9990000128746033
epsilon 1e-07
amsgrad False
weight_decay 9.999999747378752e-05
exclude_from_weight_decay None
training_precision float32

Training Metrics

After 10 Epocs, the test accuracy of the model is 83.57%

Model Plot

View Model Plot

Model Image

Downloads last month
Hosted inference API
Drag image file here or click to browse from your device
This model can be loaded on the Inference API on-demand.

Space using keras-io/conv_Mixer 1