Image Classification

EfficientNet

Use case : Image classification

Model description

EfficientNet was initially introduced in this paper. The authors proposed a method that uniformly scales all dimensions depth/width/resolution using a so-called compound coefficient. Using neural architecture search, the authors created the EfficientNet topology and starting from B0, derived a few variants B1...B7 ordered by increasing complexity. Its main building blocks are a mobile inverted bottleneck MBConv (Sandler et al., 2018; Tan et al., 2019) and a squeeze-and-excitation optimization (Hu et al., 2018).

EfficientNet provides state-of-the art accuracy on ImageNet and CIFAR for example while being much smaller and faster than its comparable (ResNet, DenseNet, Inception...). However, for STM32 platforms, B0 is already too large. That's why, we internally derived a custom version tailored for STM32 and modified it to be quantization-friendly (not discussed in the initial paper). This custom model is then quantized in int8 using Tensorflow Lite converter. In the following, the resulting model is called ST EfficientNet LC v1 (LC standing for Low Complexity).

ST EfficientNet LC v1 was obtained after fine-tuning of the original topology. Our goal was to reach around 500 kBytes for RAM and weights. For achieving this, we decided to replace original 'swish' by a simple 'relu6', and search for good expansion factor, depth and width coefficients. Of course, many models could meet the requirement. We selected the one which was better performing on food-101 dataset. We made several attempts to quantize the EfficientNet topology, and discover some issues when quantizing activations. The problem was fixed mainly by adding a clipping lambda layer before the sigmoid.

Network information

Network Information Value
Framework TensorFlow Lite
Params 517540
Quantization int8
Paper https://arxiv.org/pdf/1905.11946.pdf

The models are quantized using tensorflow lite converter.

Network inputs / outputs

For an image resolution of NxM and P classes :

Input Shape Description
(1, N, M, 3) Single NxM RGB image with UINT8 values between 0 and 255
Output Shape Description
(1, P) Per-class confidence for P classes

Recommended platform

Platform Supported Recommended
STM32L0 [] []
STM32L4 [] []
STM32U5 [x] []
STM32H7 [x] [x]
STM32MP1 [x] [x]
STM32MP2 [x] []
STM32N6 [x] []

Performances

Metrics

  • Measures are done with default STM32Cube.AI configuration with enabled input / output allocated option.

  • tfs stands for "training from scratch", meaning that the model weights were randomly initialized before training.

Reference NPU memory footprint on food-101 dataset (see Accuracy for details on dataset)

Model Format Resolution Series Internal RAM (KiB) External RAM (KiB) Weights Flash (KiB) STM32Cube.AI version STEdgeAI Core version
ST EfficientNet LC v1 tfs Int8 128x128x3 STM32N6 256 0 625.8 10.0.0 2.0.0
ST EfficientNet LC v1 tfs Int8 224x224x3 STM32N6 784.02 0 632.55 10.0.0 2.0.0

Reference NPU inference time on food-101 dataset (see Accuracy for details on dataset)

Model Format Resolution Board Execution Engine Inference time (ms) Inf / sec STM32Cube.AI version STEdgeAI Core version
ST EfficientNet LC v1 tfs Int8 128x128x3 STM32N6570-DK NPU/MCU 6.87 145.55 10.0.0 2.0.0
ST EfficientNet LC v1 tfs Int8 224x224x3 STM32N6570-DK NPU/MCU 15.8 63.29 10.0.0 2.0.0

Reference MCU memory footprints based on Flowers dataset (see Accuracy for details on dataset)

Model Format Resolution Series Activation RAM Runtime RAM Weights Flash Code Flash Total RAM Total Flash STM32Cube.AI version
ST EfficientNet LC v1 tfs Int8 224x224x3 STM32H7 430.78 KiB 58.19 KiB 505.41 KiB 158.4 KiB 488.97 KiB 663.81 KiB 10.0.0
ST EfficientNet LC v1 tfs Int8 128x128x3 STM32H7 166.78 KiB 57.86 KiB 505.41 KiB 157.68 KiB 224.64 KiB 663.09 KiB 10.0.0

Reference MCU inference time based on Flowers dataset (see Accuracy for details on dataset)

Model Format Resolution Board Execution Engine Frequency Inference time (ms) STM32Cube.AI version
ST EfficientNet LC v1 tfs Int8 224x224x3 STM32H747I-DISCO 1 CPU 400 MHz 438.33 ms 10.0.0
ST EfficientNet LC v1 tfs Int8 128x128x3 STM32H747I-DISCO 1 CPU 400 MHz 144.96 ms 10.0.0
ST EfficientNet LC v1 tfs Int8 224x224x3 STM32F769I-DISCO 1 CPU 216 MHz 871.7 ms 10.0.0
ST EfficientNet LC v1 tfs Int8 128x128x3 STM32F769I-DISCO 1 CPU 216 MHz 259.5 ms 10.0.0

Reference MPU inference time based on Flowers dataset (see Accuracy for details on dataset)

Model Format Resolution Quantization Board Execution Engine Frequency Inference time (ms) %NPU %GPU %CPU X-LINUX-AI version Framework
ST EfficientNet LC v1 tfs Int8 224x224x3 per-channel** STM32MP257F-DK2 NPU/GPU 800 MHz 36.75 ms 16.89 83.11 0 v5.1.0 OpenVX
ST EfficientNet LC v1 tfs Int8 128x128x3 per-channel** STM32MP257F-DK2 NPU/GPU 800 MHz 14.67 ms 32.55 67.45 0 v5.1.0 OpenVX
ST EfficientNet LC v1 tfs Int8 224x224x3 per-channel STM32MP157F-DK2 2 CPU 800 MHz 140.6 ms NA NA 100 v5.1.0 TensorFlowLite 2.11.0
ST EfficientNet LC v1 tfs Int8 128x128x3 per-channel STM32MP157F-DK2 2 CPU 800 MHz 47.50 ms NA NA 100 v5.1.0 TensorFlowLite 2.11.0
ST EfficientNet LC v1 tfs Int8 224x224x3 per-channel STM32MP135F-DK2 1 CPU 1000 MHz 198.7 ms NA NA 100 v5.1.0 TensorFlowLite 2.11.0
ST EfficientNet LC v1 tfs Int8 128x128x3 per-channel STM32MP135F-DK2 1 CPU 1000 MHz 63.84 ms NA NA 100 v5.1.0 TensorFlowLite 2.11.0

** To get the most out of MP25 NPU hardware acceleration, please use per-tensor quantization

Accuracy with Flowers dataset

Dataset details: http://download.tensorflow.org/example_images/flower_photos.tgz , License CC - BY 2.0 Number of classes: 5, 3670 files

Model Format Resolution Top 1 Accuracy (%)
ST EfficientNet LC v1 tfs Float 224x224x3 90.19
ST EfficientNet LC v1 tfs Int8 224x224x3 89.92
ST EfficientNet LC v1 tfs Float 128x128x3 87.19
ST EfficientNet LC v1 tfs Int8 128x128x3 86.78

Accuracy with Plant dataset

Dataset details: https://data.mendeley.com/datasets/tywbtsjrjv/1 , License CC0 1.0 Number of classes: 39, number of files: 55448

Model Format Resolution Top 1 Accuracy (%)
ST EfficientNet LC v1 tfs Float 224x224x3 99.86
ST EfficientNet LC v1 tfs Int8 224x224x3 99.78
ST EfficientNet LC v1 tfs Float 128x128x3 99.76
ST EfficientNet LC v1 tfs Int8 128x128x3 99.63

Accuracy with Food-101 dataset

Dataset details: https://data.vision.ee.ethz.ch/cvl/datasets_extra/food-101/, Number of classes: 101, number of files: 101000

Model Format Resolution Top 1 Accuracy (%)
ST EfficientNet LC v1 tfs Float 224x224x3 74.84
ST EfficientNet LC v1 tfs Int8 224x224x3 74.44
ST EfficientNet LC v1 tfs Float 128x128x3 63.58
ST EfficientNet LC v1 tfs Int8 128x128x3 63.07

Retraining and Integration in a simple example:

Please refer to the stm32ai-modelzoo-services GitHub here

References

[1] "Tf_flowers : tensorflow datasets," TensorFlow. [Online]. Available: https://www.tensorflow.org/datasets/catalog/tf_flowers.

[2] J, ARUN PANDIAN; GOPAL, GEETHARAMANI (2019), "Data for: Identification of Plant Leaf Diseases Using a 9-layer Deep Convolutional Neural Network", Mendeley Data, V1, doi: 10.17632/tywbtsjrjv.1

[3] L. Bossard, M. Guillaumin, and L. Van Gool, "Food-101 -- Mining Discriminative Components with Random Forests." European Conference on Computer Vision, 2014.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.