Model card for image quality assessment using Inception v3 encoder.

An official Zilia image classification model with an Inception v3 encoder model model. Trained in pytorch on ImageNet-1k by torchvision.

Paper for Inception v3: https://arxiv.org/abs/1512.00567

Model Details

Model Description

The model use the standard deep learning image classification architecute with an Inception v3 pretrained encoder from torchvision and fine tuned on zilia iqa dataset.

Developed by: Zilia Inc.
Authors:
- Jasmine Poirier: jasmine.poirier@ziliahealth.com
Funded by [optional]: Zilia Inc.
Shared by [optional]: Zilia Inc.
Model type: Image classification
Base model: InceptionV3
License: Copyright (C) Zilia Inc. - All Rights Reserved
Finetuned from encoder model [optional]: torchvision.
Model Stats:
- Params (M): 24.4
- GMACs: 6
- GFLOPS: 12
- Size (MO): 98
- Image size: (512, 608, 3)
- Output size single image with post processing: (1)
- Output size batch images with post processing: (batch, 1)
- Raw Output size: (batch, 1)
- Classification Label names: [iqa] (good / bad)
Dataset: Unknown
MLFlow uri: https://mlflow.zilia.zone/
MLFlow experiment names:
- Unknown
MLFlow run name: Unknown

Model Encoder Sources [optional]

Repository: https://github.com/pytorch/vision/blob/main/torchvision/models/inception.py
HuggingFace: https://huggingface.co/timm/inception_v3.tv_in1k
Papers:
- Rethinking the Inception Architecture for Computer Vision: https://arxiv.org/abs/1512.00567

Acces to huggingface-hub

Documentation for how to access huggingface-hub is available in zilia-models-iqa repository.

Model Usage

Documentation for how to load a model and perform prediction is available in zilia-models-iqa repository.

Bias, Risks, and Limitations

Model has been train on a DRIMDB public dataset, so the dataset might not completely represent the images seen in production using the Zilia Ocular. So care must be used when using the model automatically without any human reviews or human correction.

Recommendations

Use the model in a application where the human can review and correct the outputed classification.

Training Details

All training details are defined in IQA Project Document.

Preprocessing [optional]

Is manage through the onnx/preprocessing.onnx file and throught the preprocessor_config.json file.

It correspond to:

Convert image to rgb
Convert to float and rescale between 0 and 1
Normalize using mean [0.485, 0.456, 0.406] and std [0.229, 0.224, 0.225]
Resize using PIL (PIL.Image.Resampling.BILINEAR) or CV (cv.INTER_LINEAR) to size (608, 512)

Postprocessing [optional]

Is manage through the postprocessing.json file using a custom processor class.

It correspond to:

Convert the output to int.

Evaluation / Test

All evaluation details are defined in IQA Project Document.

Metrics

Any classification metrics are good (accuracy, precision, recall, F1-Score) to be used to evaluate the metrics. We used accuracy and F1-Score since it is the most common metrics for image segmentation.

Results

Results for the train, validation, test and DRIMBD dataset:

Metric	Train	Val	Test	DRIMDB
F1	0,923	0,879	0,900	0,933
Accuracy	0,905	0,852	0,875	0,947
Precision	0,971	0,945	0,947	0,875
Recall	0,881	0,821	0,857	1,000

Prediction times for the model:

	torch	onnx	onnx-sim
Preprocessing time [ms]	6,09	3,98	3,63
Prediction time [ms]	65,38	48,9	46

Summary

Prediction accuracy of 87.5% on the test dataset demonstrates that the model has learned how to distinguish poor quality from good quality images according to the IFU exclusion criteria. However, those performances were quantitatively assessed on data acquired with older versions of the software, rendering qualitatively different images, mainly in terms of noise level and image exposure. To assess the performance of acquisitions from software version 1.6.0 and above, the IQA was tested using 412 manually annotated images obtained with these newer versions, where an accuracy of 75% was obtained.

These results highlighted how the absence of objective sharpness criteria during manual annotation affects model performance. To overcome these limitations, we will thoroughly characterize the impact of sharpness defects on spectral and StO2 measurements. This will allow us to establish a threshold for acceptable sharpness defects, considering their effect on identifying ocular structures and the accuracy and localization of spectral data.

A notable limitation of the current method is the inability to detect poor quality images presenting slight or unsaturated illumination defects which are not adequately represented in the training dataset. Prior to retraining, the correlation between illumination defects in fundus images and impaired spectral data will be evaluated, which may lead to revisions of the IFU review criteria. According to the current IFU, all sampling points exhibiting an illumination defect, regardless of intensity, position relative to the ROSA, or overall significance in the image, must be excluded from the StO2 statistic. This strict criterion was put in place to prevent any uncharacterized spectral or StO2 bias from influencing the statistics.

Finally, when retraining the model, examples of images presenting double structures as the only image quality defect will be added to the dataset to improve performances on such test cases.

While those limitations will be addressed in the future to improve model’s accuracy on more representative data and reflect our advanced knowledge on the correlation between image quality defect and spectral and StO2 accuracy, performances are overall satisfactory. Among the 49 acquisitions from the usability internal study, only 4 were entirely rejected (less than 3 valid sampling points) due to IQA criteria, and all 4 were objectively poor quality acquisition.

The IQA algorithm thus currently provides an accurate assessment according to IFU review criteria, but presents limitations on data acquired on software versions ≥ 1.6.0. that must be addressed before official implementation. Before such adjustments are made, a thorough characterization of the correlation between image quality defects–more particularly illumination and sharpness defects–and spectral bias will be conducted to refine the acquisition review guidelines presented in the IFU.

Citation

TP-ZO-CHT-05 V.0 Dataset description for Image Quality Assessment

@misc{rw2019timm,
  author = {Ross Wightman},
  title = {PyTorch Image Models},
  year = {2019},
  publisher = {GitHub},
  journal = {GitHub repository},
  doi = {10.5281/zenodo.4414861},
  howpublished = {\url{https://github.com/huggingface/pytorch-image-models}}
}

@ARTICLE{Zago2018-ck,
  title     = "Retinal image quality assessment using deep learning",
  author    = "Zago, Gabriel Tozatto and Andre{\~a}o, Rodrigo Varej{\~a}o and
               Dorizzi, Bernadette and Teatini Salles, Evandro Ottoni",
  abstract  = "Poor-quality retinal images do not allow an accurate medical
               diagnosis, and it is inconvenient for a patient to return to a
               medical center to repeat the fundus photography exam. In this
               paper, a robust automatic system is proposed to assess the
               quality of retinal images at the moment of the acquisition,
               aiming at assisting health care professionals during a fundus
               photography exam. We propose a convolutional neural network
               (CNN) pretrained on non-medical images for extracting general
               image features. The weights of the CNN are further adjusted via
               a fine-tuning procedure, resulting in a performant classifier
               obtained only with a small quantity of labeled images. The CNN
               performance was evaluated on two publicly available databases
               (i.e., DRIMDB and ELSA-Brasil) using two different procedures:
               intra-database and inter-database cross-validation. The CNN
               achieved an area under the curve (AUC) of 99.98\% on DRIMDB and
               an AUC of 98.56\% on ELSA-Brasil in the inter-database
               experiment, where training and testing were not performed on the
               same database. These results show the robustness of the proposed
               model to various image acquisitions without requiring special
               adaptation, thus making it a good candidate for use in
               operational clinical scenarios.",
  journal   = "Comput. Biol. Med.",
  publisher = "Elsevier BV",
  volume    =  103,
  pages     = "64--70",
  month     =  dec,
  year      =  2018,
  keywords  = "Convolutional neural networks; Deep learning; Diabetic
               retinopathy; Image quality; Retinal images",
  language  = "en"
}

@misc{szegedy2015rethinkinginceptionarchitecturecomputer,
      title={Rethinking the Inception Architecture for Computer Vision},
      author={Christian Szegedy and Vincent Vanhoucke and Sergey Ioffe and Jonathon Shlens and Zbigniew Wojna},
      year={2015},
      eprint={1512.00567},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/1512.00567},
}

@INPROCEEDINGS{5206848,
  author={Deng, Jia and Dong, Wei and Socher, Richard and Li, Li-Jia and Kai Li and Li Fei-Fei},
  booktitle={2009 IEEE Conference on Computer Vision and Pattern Recognition},
  title={ImageNet: A large-scale hierarchical image database},
  year={2009},
  volume={},
  number={},
  pages={248-255},
  keywords={Large-scale systems;Image databases;Explosions;Internet;Robustness;Information retrieval;Image retrieval;Multimedia databases;Ontologies;Spine},
  doi={10.1109/CVPR.2009.5206848}}