cxr-foundation / README.md
tthelin's picture
Fix library name used
a049744 verified
|
raw
history blame
13.5 kB
---
license: other
license_name: hai-def
license_link: https://developers.google.com/health-ai-developer-foundations/terms
language:
- en
tags:
- medical
- x-ray
- chest-x-ray
- medical-embeddings
extra_gated_heading: Access CXR Foundation on Hugging Face
extra_gated_prompt: >-
To access CXR Foundation on Hugging Face, you're required to review and
agree to [Health AI Developer Foundation's terms of use](https://developers.google.com/health-ai-developer-foundations/terms).
To do this, please ensure you’re logged in to Hugging Face and click below.
Requests are processed immediately.
extra_gated_button_content: Acknowledge license
library_name: cxr-foundation
---
# CXR Foundation model card
**Model documentation**:
[CXR Foundation](https://developers.google.com/health-ai-developer-foundations/cxr-foundation)
**Resources**:
* Model on Google Cloud Model Garden:
[CXR Foundation](https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/cxr-foundation)
* Model on Hugging Face:
[google/cxr-foundation](https://huggingface.co/google/cxr-foundation)
* GitHub repository (supporting code, Colab notebooks, discussions, and
issues): [cxr-foundation](https://github.com/google-health/cxr-foundation)
* Quick start notebook:
[notebooks/quick_start](https://github.com/google-health/cxr-foundation/blob/master/notebooks/quick_start_with_hugging_face.ipynb)
* Support: See
[Contact](https://developers.google.com/health-ai-developer-foundations/cxr-foundation/get-started.md#contact).
**Terms of use**:
[Health AI Developer Foundations terms of use](https://developers.google.com/health-ai-developer-foundations/terms)
**Author**: Google
## Model information
This section describes the CXR Foundation model and how to use it.
### Description
CXR Foundation is a machine learning model designed to accelerate AI development
for chest X-ray image analysis. It is pre-trained on large amounts of chest
X-rays, to produce embeddings that capture dense features relevant for analyzing
these images. As a result, the embeddings CXR Foundation produces enable the
efficient training of AI models with significantly less data and compute than
traditional methods. CXR Foundation offers two types of embeddings:
* ELIXR v2.0: Produces 32x768 dimensional vectors, capturing detailed image
features relevant to X-ray analysis.
* ELIXR-contrastive / v2.0 text: Generates 32x128 dimensional vectors and
allows for projecting chest X-ray images and textual prompts into a shared
embedding space. This enables powerful applications like semantic image
retrieval and zero-shot classification.
You can read more about the research behind CXR Foundation in our manuscript:
[ELIXR: Towards a general purpose X-ray artificial intelligence system through
alignment of large language models and radiology vision
encoders](https://arxiv.org/abs/2308.01317).
### How to use
For getting started quickly with Hugging Face, refer to the Quick start notebook
in the next section.
If you want to use the model at scale, we recommend that you create a production
version using
[Model Garden](https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/cxr-foundation).
### Examples
See the following Colab notebooks for examples of how to use CXR Foundation:
* To give the model a quick try, running it locally with weights from Hugging
Face, see
[Quick start notebook in Colab](https://colab.research.google.com/github/google-health/cxr-foundation/blob/master/notebooks/quick_start_with_hugging_face.ipynb).
* For an example of how to use the model to train a linear classifier see
[Linear classifier notebook in Colab](https://colab.research.google.com/github/google-health/cxr-foundation/blob/master/notebooks/train_data_efficient_classifier.ipynb).
* For an example of how to retrieve images from a database using text-image
similarity see
[Text retrieval notebook in Colab](https://colab.research.google.com/github/google-health/cxr-foundation/blob/master/notebooks/retrieve_images_by_text.ipynb).
* For an example of how to use the text embeddings to perform zero-shot
inference see
[Zero-shot inference notebook in Colab](https://colab.research.google.com/github/google-health/cxr-foundation/blob/master/notebooks/classify_images_with_natural_language.ipynb).
### Model architecture overview
The model uses the
[EfficientNet-L2 architecture](https://arxiv.org/pdf/1911.04252v4.pdf) and
[BERT architecture](https://arxiv.org/abs/1810.04805). It was trained on 821,544
CXRs from India and the US using abnormal vs. normal labels, i.e. the image
contained any kind of abnormality, and the
[Supervised Contrastive loss](https://arxiv.org/abs/2004.11362v1) as well as
accompanying radiology reports and the
[CLIP loss](https://arxiv.org/pdf/2103.00020.pdf) and
[BLIP-2 losses](https://arxiv.org/abs/2301.12597). The abnormal vs. normal
labels were obtained from more granular labels (e.g. pneumothorax, fracture) as
well as
[regular expressions on radiology reports](https://pubmed.ncbi.nlm.nih.gov/34471144/).
You can read more about the research behind CXR Foundation in our recent
publication:
[Simplified Transfer Learning for Chest Radiography Models Using Less Data.](https://pubs.rsna.org/doi/10.1148/radiol.212482)
### Technical specifications
* Model type: Convolutional neural network that produces embeddings
* Key publications:
* [Simplified Transfer Learning for Chest Radiography Models Using Less
Data](https://pubs.rsna.org/doi/10.1148/radiol.212482)
* [ELIXR: Towards a general purpose X-ray artificial intelligence system
through alignment of large language models and radiology vision
encoders](https://arxiv.org/abs/2308.01317)
* Model created: August 2, 2024
* Model version: Version: 2.0.0
### Performance and validation
CXR Foundation was evaluated across a range of different tasks for
data-efficient classification, zero-shot classification, semantic image
retrieval, visual-question answering and report quality assurance.
### Key performance metrics
* Data-efficient Classification: **Mean AUCs of 0.898** (across atelectasis,
cardiomegaly, consolidation, pleural effusion, and pulmonary edema) on
CheXPert test
* Zero-shot classification: **Mean AUC of 0.846 across 13 findings** on
CheXpert test. Findings included: atelectasis, cardiomegaly, consolidation,
pleural effusion, and pulmonary edema, enlarged cardiomediastinum, pleural
other, pneumothorax, support devices, airspace opacity, lung lesion,
pneumonia, and fracture.
* Semantic image retrieval: **0.76 normalized discounted cumulative gain
(NDCG) @5** across 19 queries for semantic image retrieval, including
perfect retrieval on 12 of them.
* Reference: [ELIXR: Towards a general purpose X-ray artificial intelligence
system through alignment of large language models and radiology vision
encoders](https://arxiv.org/pdf/2308.01317)
### Inputs and outputs
* **Input**: Serialized `tf.Example` (with the bytes of a `PNG` written in the
image/encoded feature key).
* **Output**: Embedding (a vector of floating points representing a projection
of the original image into a compressed feature space)
## Dataset details
### Training dataset
CXR Foundation was trained using the following de-identified datasets:
* MIMIC-CXR, comprising of 243,324 images of 60,523 unique patients (cited
below);
* A private US dataset from an AMC in Illinois comprising of 165,182 images of
12,988 unique patients; and
* A private Indian dataset from five hospitals comprising of 485,082 patients
of 348,335 unique patients
### Labeling
Supervised learning was used to label abnormal and normal human data from
radiology reports.
A medically tuned LLM, Med-Palm 2 29, was then applied to ensure that the labels
were consistent with the report, and a board certified thoracic radiologist (CL)
adjudicated cases where the LLM results differed from the ground truth in
MIMIC-CXR.
*Additional information about data and labels used to evaluate CXR Foundation
for downstream tasks can be found in the following references:*
- [Sellergren A, Chen C, et al. Simplified Transfer Learning for Chest
Radiography Models Using Less Data. Radiology.
2022.](https://pubs.rsna.org/doi/full/10.1148/radiol.212482)
- [https://pubs.rsna.org/doi/10.1148/radiol.212482](https://pubs.rsna.org/doi/10.1148/radiol.212482)
(Table 1, 2, 3)
- [https://github.com/google-research/google-research/tree/master/supcon](https://github.com/google-research/google-research/tree/master/supcon)
## License
The use of CXR Foundation is governed by the
[Health AI Developer Foundations terms of use](https://developers.google.com/health-ai-developer-foundations/terms).
## Data citation
- [MIMIC-CXR Johnson, A., Pollard, T., Mark, R., Berkowitz, S., & Horng, S.
(2024). MIMIC-CXR Database (version 2.1.0).
PhysioNet.](https://doi.org/10.13026/4jqj-jw95)
- [Johnson, A.E.W., Pollard, T.J., Berkowitz, S.J. et al. MIMIC-CXR, a
de-identified publicly available database of chest radiographs with
free-text reports. Sci Data 6, 317
(2019).](https://doi.org/10.1038/s41597-019-0322-0)
- Available on Physionet Goldberger, A., Amaral, L., Glass, L., Hausdorff, J.,
Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). [PhysioBank,
PhysioToolkit, and PhysioNet: Components of a new research resource for
complex physiologic signals. Circulation [Online]. 101 (23), pp.
e215–e220.](https://pubmed.ncbi.nlm.nih.gov/10851218/)
## Implementation information
Details about the model internals.
### Software
Training was done using [JAX](https://github.com/google/jax)
JAX allows researchers to take advantage of the latest generation of hardware,
including TPUs, for faster and more efficient training of large models.
## Use and limitations
### Intended use
* CXR Foundation can reduce the training data, compute, and technical
expertise necessary to develop AI applications for radiographs. The model
has been optimized for chest X-rays, but researchers have reported success
using it for other types of X-rays, including X-rays of other body parts and
even veterinary X-rays. Some example applications include:
#### Data-efficient classification:
With a low amount of labeled data, you can train a classifier model on top of
CXR Foundation embeddings (ELIXR v2.0). Furthermore, each embedding can be used
downstream as an input for a variety of different classifiers, with very little
additional compute. Below are some example classification tasks:
* Clinical findings like fracture or pneumothorax
* Determining X-ray image quality
* Determining the X-ray view or body part
* Determining the presence of devices
* Discovering misplaced tubes
#### Zero-shot classification
By using the contrastive mode (ELIXR-contrastive / v2.0 text), users can get a
classification score without any additional training data through
textual prompts. Zero-shot works by measuring the relative distance of the image
embeddings from a positive e.g., "pleural effusion present", and negative text
prompt e.g., "normal X-ray". The use cases are the same as data-efficient
classification but don't require data to train. The zero-shot method will
outperform data-efficient classifications at low levels of training data, while
the data-efficient classification will tend to exceed zero-shot performance with
larger amounts of data. See [ELIXR paper](https://arxiv.org/pdf/2308.01317) for
more details.
#### Semantic image retrieval
By using the contrastive mode (ELIXR-contrastive / v2.0 text) users can rank a
set of X-rays across a search query. Similar to Zero-shot classification,
language-based image retrieval relies on the distance between the embeddings of
the set of images and the text embeddings from the search query.
### Benefits
* CXR Foundation Embeddings can be used for efficient training of AI
development for chest X-ray image analysis with significantly less data and
compute than traditional methods.
* By leveraging the large set of pre-trained images CXR Foundation is trained
on, users need less data but can also build more generalizable models than
training on more limited datasets.
### Limitations
The following are known factors that might limit the generalizability or
usefulness of the model output for application in downstream tasks:
* The model was trained using only de-identified data from the US and India
and may not generalize well to data from other countries, patient
populations, or manufacturers not used in training.
* The model has only been validated for a limited number of the many potential
downstream tasks involving chest radiographs.
* Image quality and min resolution. 1024x1024 recommended.
* The model is only used to generate embeddings of user-provided data. It does
not generate any predictions or diagnosis on its own.
* Task-specific validation remains an important aspect of downstream model
development by the end user.
* As with any research, developers should ensure that any downstream
application is validated to understand performance using data that is
appropriately representative of the intended use setting for the specific
application (e.g., age, sex, gender, condition, scanner, etc.).