ResNet

This ResNet34 model was translated from the ONNX ResNetv1 model found at https://github.com/onnx/models/tree/main/vision/classification/resnet into Axon using AxonOnnx The following description is copied from the relevant description at the ONNX repository.

Use cases

These ResNet models perform image classification - they take images as input and classify the major object in the image into a set of pre-defined classes. They are trained on ImageNet dataset which contains images from 1000 classes. ResNet models provide very high accuracies with affordable model sizes. They are ideal for cases when high accuracy of classification is required.

ImageNet trained models are often used as the base layers for a transfer learning approach to training a model in your domain. Transfer learning can significantly reduce the processing necessary to train an accurate model in your domain. This model was published here with the expectation that it would be useful to the Elixir community for transfer learning and other similar approaches.

Description

Deeper neural networks are more difficult to train. Residual learning framework ease the training of networks that are substantially deeper. The research explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. It also provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset the residual nets were evaluated with a depth of up to 152 layers — 8× deeper than VGG nets but still having lower complexity.

Model

ResNet models consists of residual blocks and came up to counter the effect of deteriorating accuracies with more layers due to network not learning the initial layers. ResNet v1 uses post-activation for the residual blocks.

Input

All pre-trained models expect input images normalized in the same way, i.e. mini-batches of 3-channel RGB images of shape (N x 3 x H x W), where N is the batch size, and H and W are expected to be at least 224. The inference was done using jpeg image.

Preprocessing

The images have to be loaded in to a range of [0, 1] and then normalized using mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225]. The transformation should preferably happen at preprocessing.

Output

The model outputs image scores for each of the 1000 classes of ImageNet.

Postprocessing

The post-processing involves calculating the softmax probability scores for each class. You can also sort them to report the most probable classes. Check imagenet_postprocess.py for code.

Dataset

Dataset used for train and validation: ImageNet (ILSVRC2012). Check imagenet_prep for guidelines on preparing the dataset.

References

ResNetv1 Deep residual learning for image recognition He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778. 2016.
ONNX source model onnx/models vision/classification/resnet resnet34-v1-7.onnx