Lung Adeno/Squam GAN v1 Model Card

This model card describes a model associated with a manuscript that is currently under review. Links to the manuscript will be provided once publicly available.

Model Details

Developed by: James Dolezal
Model type: Generative adversarial network
Language(s): English
License: GPL-3.0
Model Description: This is a StyleGAN2 model that can generate synthetic H&E pathologic images of lung cancer. The GAN is conditioned on histologic subtype, with categories adenocarcinoma (=0) and squamous cell carcinoma (=1).
Image processing: This model generates images at 512 x 512 px resolution and was trained on lossless (PNG) pathologic images at 400 x 400 μm magnification.
Resources for more information: GitHub Repository

Uses

Examples

This model is a StyleGAN2 model and can be used with any StyleGAN-compatible scripts and tools. The GitHub repository associated with his model includes detailed information on how to interface with the GAN, generate images, and perform class blending via embedding interpolation.

Direct Use

This model is intended for research purposes only. Possible research areas and tasks include

Applications in educational settings.
Research on pathology classification models for lung cancer.

Excluded uses are described below.

Misuse and Out-of-Scope Use

Output from this model should not be used in a clinical setting or be provided to patients, physicians, or any other health care members directly involved in their health care outside the context of an approved research protocol. Using the model in a clinical setting outside the context of an approved research protocol is a misuse of this model. This includes influencing a patient's health care treatment in any way based on output from this model.

Limitations

The training dataset did not include adenosquamous tumors, so intermediate states represented by the GAN through embedding interpolation may or may not be biologically consistent with the truly intermediate adenosquamous tumors.

Bias

This model was trained on The Cancer Genome Atlas (TCGA), which contains patient data from communities and cultures which may not reflect the general population. This datasets is comprised of images from multiple institutions, which may introduce a potential source of bias from site-specific batch effects (Howard, 2021).

Training

Training Data The following dataset was used to train the model:

The Cancer Genome Atlas (TCGA), LUAD (adenocarcinoma) and LUSC (squamous cell carcinoma) cohorts (see next section)

This model was trained on a total of 941 slides, with 467 adenocarcinomas and 474 squamous cell carcinomas.

Training Procedure Each whole-slide image was sectioned into smaller images in a grid-wise fashion in order to extract tiles from whole-slide images at 400 x 400 μm. Image tiles were extracted at the nearest downsample layer, and resized to 512 x 512 px using Libvips. During training, images are randomly flipped and rotated (90, 180, 270). Training is otherwise identical to the official StyleGAN2 implementation.

Additional training information:

Hardware: 4 x A100 GPUs
Batch size: 32
R1 gamma: 1.6384
Training time: 25,000 kimg

Evaluation Results

External evaluation results are currently under peer review and will be posted once publicly available.