|
--- |
|
license: other |
|
license_name: hai-def |
|
license_link: https://developers.google.com/health-ai-developer-foundations/terms |
|
language: |
|
- en |
|
tags: |
|
- medical |
|
- x-ray |
|
- chest-x-ray |
|
- medical-embeddings |
|
extra_gated_heading: Access CXR Foundation on Hugging Face |
|
extra_gated_prompt: >- |
|
To access CXR Foundation on Hugging Face, you're required to review and |
|
agree to [Health AI Developer Foundation's terms of use](https://developers.google.com/health-ai-developer-foundations/terms). |
|
To do this, please ensure you’re logged in to Hugging Face and click below. |
|
Requests are processed immediately. |
|
extra_gated_button_content: Acknowledge license |
|
library_name: cxr-foundation |
|
--- |
|
|
|
# CXR Foundation model card |
|
|
|
**Model documentation**: |
|
[CXR Foundation](https://developers.google.com/health-ai-developer-foundations/cxr-foundation) |
|
|
|
**Resources**: |
|
|
|
* Model on Google Cloud Model Garden: |
|
[CXR Foundation](https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/cxr-foundation) |
|
* Model on Hugging Face: |
|
[google/cxr-foundation](https://huggingface.co/google/cxr-foundation) |
|
* GitHub repository (supporting code, Colab notebooks, discussions, and |
|
issues): [cxr-foundation](https://github.com/google-health/cxr-foundation) |
|
* Quick start notebook: |
|
[notebooks/quick_start](https://github.com/google-health/cxr-foundation/blob/master/notebooks/quick_start_with_hugging_face.ipynb) |
|
* Support: See |
|
[Contact](https://developers.google.com/health-ai-developer-foundations/cxr-foundation/get-started.md#contact). |
|
|
|
**Terms of use**: |
|
[Health AI Developer Foundations terms of use](https://developers.google.com/health-ai-developer-foundations/terms) |
|
|
|
**Author**: Google |
|
|
|
## Model information |
|
|
|
This section describes the CXR Foundation model and how to use it. |
|
|
|
### Description |
|
|
|
CXR Foundation is a machine learning model designed to accelerate AI development |
|
for chest X-ray image analysis. It is pre-trained on large amounts of chest |
|
X-rays, to produce embeddings that capture dense features relevant for analyzing |
|
these images. As a result, the embeddings CXR Foundation produces enable the |
|
efficient training of AI models with significantly less data and compute than |
|
traditional methods. CXR Foundation offers two types of embeddings: |
|
|
|
* ELIXR v2.0: Produces 32x768 dimensional vectors, capturing detailed image |
|
features relevant to X-ray analysis. |
|
* ELIXR-contrastive / v2.0 text: Generates 32x128 dimensional vectors and |
|
allows for projecting chest X-ray images and textual prompts into a shared |
|
embedding space. This enables powerful applications like semantic image |
|
retrieval and zero-shot classification. |
|
|
|
You can read more about the research behind CXR Foundation in our manuscript: |
|
[ELIXR: Towards a general purpose X-ray artificial intelligence system through |
|
alignment of large language models and radiology vision |
|
encoders](https://arxiv.org/abs/2308.01317). |
|
|
|
### How to use |
|
|
|
For getting started quickly with Hugging Face, refer to the Quick start notebook |
|
in the next section. |
|
|
|
If you want to use the model at scale, we recommend that you create a production |
|
version using |
|
[Model Garden](https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/cxr-foundation). |
|
|
|
### Examples |
|
|
|
See the following Colab notebooks for examples of how to use CXR Foundation: |
|
|
|
* To give the model a quick try, running it locally with weights from Hugging |
|
Face, see |
|
[Quick start notebook in Colab](https://colab.research.google.com/github/google-health/cxr-foundation/blob/master/notebooks/quick_start_with_hugging_face.ipynb). |
|
|
|
* For an example of how to use the model to train a linear classifier see |
|
[Linear classifier notebook in Colab](https://colab.research.google.com/github/google-health/cxr-foundation/blob/master/notebooks/train_data_efficient_classifier.ipynb). |
|
|
|
* For an example of how to retrieve images from a database using text-image |
|
similarity see |
|
[Text retrieval notebook in Colab](https://colab.research.google.com/github/google-health/cxr-foundation/blob/master/notebooks/retrieve_images_by_text.ipynb). |
|
|
|
* For an example of how to use the text embeddings to perform zero-shot |
|
inference see |
|
[Zero-shot inference notebook in Colab](https://colab.research.google.com/github/google-health/cxr-foundation/blob/master/notebooks/classify_images_with_natural_language.ipynb). |
|
|
|
### Model architecture overview |
|
|
|
The model uses the |
|
[EfficientNet-L2 architecture](https://arxiv.org/pdf/1911.04252v4.pdf) and |
|
[BERT architecture](https://arxiv.org/abs/1810.04805). It was trained on 821,544 |
|
CXRs from India and the US using abnormal vs. normal labels, i.e. the image |
|
contained any kind of abnormality, and the |
|
[Supervised Contrastive loss](https://arxiv.org/abs/2004.11362v1) as well as |
|
accompanying radiology reports and the |
|
[CLIP loss](https://arxiv.org/pdf/2103.00020.pdf) and |
|
[BLIP-2 losses](https://arxiv.org/abs/2301.12597). The abnormal vs. normal |
|
labels were obtained from more granular labels (e.g. pneumothorax, fracture) as |
|
well as |
|
[regular expressions on radiology reports](https://pubmed.ncbi.nlm.nih.gov/34471144/). |
|
|
|
You can read more about the research behind CXR Foundation in our recent |
|
publication: |
|
[Simplified Transfer Learning for Chest Radiography Models Using Less Data.](https://pubs.rsna.org/doi/10.1148/radiol.212482) |
|
|
|
### Technical specifications |
|
|
|
* Model type: Convolutional neural network that produces embeddings |
|
* Key publications: |
|
* [Simplified Transfer Learning for Chest Radiography Models Using Less |
|
Data](https://pubs.rsna.org/doi/10.1148/radiol.212482) |
|
* [ELIXR: Towards a general purpose X-ray artificial intelligence system |
|
through alignment of large language models and radiology vision |
|
encoders](https://arxiv.org/abs/2308.01317) |
|
* Model created: August 2, 2024 |
|
* Model version: Version: 2.0.0 |
|
|
|
### Performance and validation |
|
|
|
CXR Foundation was evaluated across a range of different tasks for |
|
data-efficient classification, zero-shot classification, semantic image |
|
retrieval, visual-question answering and report quality assurance. |
|
|
|
### Key performance metrics |
|
|
|
* Data-efficient Classification: **Mean AUCs of 0.898** (across atelectasis, |
|
cardiomegaly, consolidation, pleural effusion, and pulmonary edema) on |
|
CheXPert test |
|
|
|
* Zero-shot classification: **Mean AUC of 0.846 across 13 findings** on |
|
CheXpert test. Findings included: atelectasis, cardiomegaly, consolidation, |
|
pleural effusion, and pulmonary edema, enlarged cardiomediastinum, pleural |
|
other, pneumothorax, support devices, airspace opacity, lung lesion, |
|
pneumonia, and fracture. |
|
|
|
* Semantic image retrieval: **0.76 normalized discounted cumulative gain |
|
(NDCG) @5** across 19 queries for semantic image retrieval, including |
|
perfect retrieval on 12 of them. |
|
|
|
* Reference: [ELIXR: Towards a general purpose X-ray artificial intelligence |
|
system through alignment of large language models and radiology vision |
|
encoders](https://arxiv.org/pdf/2308.01317) |
|
|
|
### Inputs and outputs |
|
|
|
* **Input**: Serialized `tf.Example` (with the bytes of a `PNG` written in the |
|
image/encoded feature key). |
|
|
|
* **Output**: Embedding (a vector of floating points representing a projection |
|
of the original image into a compressed feature space) |
|
|
|
## Dataset details |
|
|
|
### Training dataset |
|
|
|
CXR Foundation was trained using the following de-identified datasets: |
|
|
|
* MIMIC-CXR, comprising of 243,324 images of 60,523 unique patients (cited |
|
below); |
|
* A private US dataset from an AMC in Illinois comprising of 165,182 images of |
|
12,988 unique patients; and |
|
* A private Indian dataset from five hospitals comprising of 485,082 patients |
|
of 348,335 unique patients |
|
|
|
### Labeling |
|
|
|
Supervised learning was used to label abnormal and normal human data from |
|
radiology reports. |
|
|
|
A medically tuned LLM, Med-Palm 2 29, was then applied to ensure that the labels |
|
were consistent with the report, and a board certified thoracic radiologist (CL) |
|
adjudicated cases where the LLM results differed from the ground truth in |
|
MIMIC-CXR. |
|
|
|
*Additional information about data and labels used to evaluate CXR Foundation |
|
for downstream tasks can be found in the following references:* |
|
|
|
- [Sellergren A, Chen C, et al. Simplified Transfer Learning for Chest |
|
Radiography Models Using Less Data. Radiology. |
|
2022.](https://pubs.rsna.org/doi/full/10.1148/radiol.212482) |
|
- [https://pubs.rsna.org/doi/10.1148/radiol.212482](https://pubs.rsna.org/doi/10.1148/radiol.212482) |
|
(Table 1, 2, 3) |
|
- [https://github.com/google-research/google-research/tree/master/supcon](https://github.com/google-research/google-research/tree/master/supcon) |
|
|
|
## License |
|
|
|
The use of CXR Foundation is governed by the |
|
[Health AI Developer Foundations terms of use](https://developers.google.com/health-ai-developer-foundations/terms). |
|
|
|
## Data citation |
|
|
|
- [MIMIC-CXR Johnson, A., Pollard, T., Mark, R., Berkowitz, S., & Horng, S. |
|
(2024). MIMIC-CXR Database (version 2.1.0). |
|
PhysioNet.](https://doi.org/10.13026/4jqj-jw95) |
|
- [Johnson, A.E.W., Pollard, T.J., Berkowitz, S.J. et al. MIMIC-CXR, a |
|
de-identified publicly available database of chest radiographs with |
|
free-text reports. Sci Data 6, 317 |
|
(2019).](https://doi.org/10.1038/s41597-019-0322-0) |
|
- Available on Physionet Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., |
|
Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). [PhysioBank, |
|
PhysioToolkit, and PhysioNet: Components of a new research resource for |
|
complex physiologic signals. Circulation [Online]. 101 (23), pp. |
|
e215–e220.](https://pubmed.ncbi.nlm.nih.gov/10851218/) |
|
|
|
## Implementation information |
|
|
|
Details about the model internals. |
|
|
|
### Software |
|
|
|
Training was done using [JAX](https://github.com/google/jax) |
|
|
|
JAX allows researchers to take advantage of the latest generation of hardware, |
|
including TPUs, for faster and more efficient training of large models. |
|
|
|
## Use and limitations |
|
|
|
### Intended use |
|
|
|
* CXR Foundation can reduce the training data, compute, and technical |
|
expertise necessary to develop AI applications for radiographs. The model |
|
has been optimized for chest X-rays, but researchers have reported success |
|
using it for other types of X-rays, including X-rays of other body parts and |
|
even veterinary X-rays. Some example applications include: |
|
|
|
#### Data-efficient classification: |
|
|
|
With a low amount of labeled data, you can train a classifier model on top of |
|
CXR Foundation embeddings (ELIXR v2.0). Furthermore, each embedding can be used |
|
downstream as an input for a variety of different classifiers, with very little |
|
additional compute. Below are some example classification tasks: |
|
|
|
* Clinical findings like fracture or pneumothorax |
|
* Determining X-ray image quality |
|
* Determining the X-ray view or body part |
|
* Determining the presence of devices |
|
* Discovering misplaced tubes |
|
|
|
#### Zero-shot classification |
|
|
|
By using the contrastive mode (ELIXR-contrastive / v2.0 text), users can get a |
|
classification score without any additional training data through |
|
textual prompts. Zero-shot works by measuring the relative distance of the image |
|
embeddings from a positive e.g., "pleural effusion present", and negative text |
|
prompt e.g., "normal X-ray". The use cases are the same as data-efficient |
|
classification but don't require data to train. The zero-shot method will |
|
outperform data-efficient classifications at low levels of training data, while |
|
the data-efficient classification will tend to exceed zero-shot performance with |
|
larger amounts of data. See [ELIXR paper](https://arxiv.org/pdf/2308.01317) for |
|
more details. |
|
|
|
#### Semantic image retrieval |
|
|
|
By using the contrastive mode (ELIXR-contrastive / v2.0 text) users can rank a |
|
set of X-rays across a search query. Similar to Zero-shot classification, |
|
language-based image retrieval relies on the distance between the embeddings of |
|
the set of images and the text embeddings from the search query. |
|
|
|
### Benefits |
|
|
|
* CXR Foundation Embeddings can be used for efficient training of AI |
|
development for chest X-ray image analysis with significantly less data and |
|
compute than traditional methods. |
|
|
|
* By leveraging the large set of pre-trained images CXR Foundation is trained |
|
on, users need less data but can also build more generalizable models than |
|
training on more limited datasets. |
|
|
|
### Limitations |
|
|
|
The following are known factors that might limit the generalizability or |
|
usefulness of the model output for application in downstream tasks: |
|
|
|
* The model was trained using only de-identified data from the US and India |
|
and may not generalize well to data from other countries, patient |
|
populations, or manufacturers not used in training. |
|
|
|
* The model has only been validated for a limited number of the many potential |
|
downstream tasks involving chest radiographs. |
|
|
|
* Image quality and min resolution. 1024x1024 recommended. |
|
|
|
* The model is only used to generate embeddings of user-provided data. It does |
|
not generate any predictions or diagnosis on its own. |
|
|
|
* Task-specific validation remains an important aspect of downstream model |
|
development by the end user. |
|
|
|
* As with any research, developers should ensure that any downstream |
|
application is validated to understand performance using data that is |
|
appropriately representative of the intended use setting for the specific |
|
application (e.g., age, sex, gender, condition, scanner, etc.). |