resnet18-v1 /
ScottMueller's picture
Update for model card
license: apache-2.0
  - Axon
  - Elixir
  - ImageNet


This ResNet18 model was translated from the ONNX ResNetv1 model found at into Axon using AxonOnnx The following description is copied from the relevant description at the ONNX repository.

Use cases

These ResNet models perform image classification - they take images as input and classify the major object in the image into a set of pre-defined classes. They are trained on ImageNet dataset which contains images from 1000 classes. ResNet models provide very high accuracies with affordable model sizes. They are ideal for cases when high accuracy of classification is required.

ImageNet trained models are often used as the base layers for a transfer learning approach to training a model in your domain. Transfer learning can significantly reduce the processing necessary to train an accurate model in your domain. This model was published here with the expectation that it would be useful to the Elixir community for transfer learning and other similar approaches.


Deeper neural networks are more difficult to train. Residual learning framework ease the training of networks that are substantially deeper. The research explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. It also provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset the residual nets were evaluated with a depth of up to 152 layers — 8× deeper than VGG nets but still having lower complexity.


ResNet models consists of residual blocks and came up to counter the effect of deteriorating accuracies with more layers due to network not learning the initial layers. ResNet v1 uses post-activation for the residual blocks.


All pre-trained models expect input images normalized in the same way, i.e. mini-batches of 3-channel RGB images of shape (N x 3 x H x W), where N is the batch size, and H and W are expected to be at least 224. The inference was done using jpeg image.


The images have to be loaded in to a range of [0, 1] and then normalized using mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225]. The transformation should preferably happen at preprocessing.


The model outputs image scores for each of the 1000 classes of ImageNet.


The post-processing involves calculating the softmax probability scores for each class. You can also sort them to report the most probable classes. Check for code.


Dataset used for train and validation: ImageNet (ILSVRC2012). Check imagenet_prep for guidelines on preparing the dataset.