cxr-foundation / README.md

Fix library name used

a049744 verified 3 days ago

13.5 kB

	---
	license: other
	license_name: hai-def
	license_link: https://developers.google.com/health-ai-developer-foundations/terms
	language:
	- en
	tags:
	- medical
	- x-ray
	- chest-x-ray
	- medical-embeddings
	extra_gated_heading: Access CXR Foundation on Hugging Face
	extra_gated_prompt: >-
	To access CXR Foundation on Hugging Face, you're required to review and
	agree to [Health AI Developer Foundation's terms of use](https://developers.google.com/health-ai-developer-foundations/terms).
	To do this, please ensure you’re logged in to Hugging Face and click below.
	Requests are processed immediately.
	extra_gated_button_content: Acknowledge license
	library_name: cxr-foundation
	---

	# CXR Foundation model card

	Model documentation:
	[CXR Foundation](https://developers.google.com/health-ai-developer-foundations/cxr-foundation)

	Resources:

	* Model on Google Cloud Model Garden:
	[CXR Foundation](https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/cxr-foundation)
	* Model on Hugging Face:
	[google/cxr-foundation](https://huggingface.co/google/cxr-foundation)
	* GitHub repository (supporting code, Colab notebooks, discussions, and
	issues): [cxr-foundation](https://github.com/google-health/cxr-foundation)
	* Quick start notebook:
	[notebooks/quick_start](https://github.com/google-health/cxr-foundation/blob/master/notebooks/quick_start_with_hugging_face.ipynb)
	* Support: See
	[Contact](https://developers.google.com/health-ai-developer-foundations/cxr-foundation/get-started.md#contact).

	Terms of use:
	[Health AI Developer Foundations terms of use](https://developers.google.com/health-ai-developer-foundations/terms)

	Author: Google

	## Model information

	This section describes the CXR Foundation model and how to use it.

	### Description

	CXR Foundation is a machine learning model designed to accelerate AI development
	for chest X-ray image analysis. It is pre-trained on large amounts of chest
	X-rays, to produce embeddings that capture dense features relevant for analyzing
	these images. As a result, the embeddings CXR Foundation produces enable the
	efficient training of AI models with significantly less data and compute than
	traditional methods. CXR Foundation offers two types of embeddings:

	* ELIXR v2.0: Produces 32x768 dimensional vectors, capturing detailed image
	features relevant to X-ray analysis.
	* ELIXR-contrastive / v2.0 text: Generates 32x128 dimensional vectors and
	allows for projecting chest X-ray images and textual prompts into a shared
	embedding space. This enables powerful applications like semantic image
	retrieval and zero-shot classification.

	You can read more about the research behind CXR Foundation in our manuscript:
	[ELIXR: Towards a general purpose X-ray artificial intelligence system through
	alignment of large language models and radiology vision
	encoders](https://arxiv.org/abs/2308.01317).

	### How to use

	For getting started quickly with Hugging Face, refer to the Quick start notebook
	in the next section.

	If you want to use the model at scale, we recommend that you create a production
	version using
	[Model Garden](https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/cxr-foundation).

	### Examples

	See the following Colab notebooks for examples of how to use CXR Foundation:

	* To give the model a quick try, running it locally with weights from Hugging
	Face, see
	[Quick start notebook in Colab](https://colab.research.google.com/github/google-health/cxr-foundation/blob/master/notebooks/quick_start_with_hugging_face.ipynb).

	* For an example of how to use the model to train a linear classifier see
	[Linear classifier notebook in Colab](https://colab.research.google.com/github/google-health/cxr-foundation/blob/master/notebooks/train_data_efficient_classifier.ipynb).

	* For an example of how to retrieve images from a database using text-image
	similarity see
	[Text retrieval notebook in Colab](https://colab.research.google.com/github/google-health/cxr-foundation/blob/master/notebooks/retrieve_images_by_text.ipynb).

	* For an example of how to use the text embeddings to perform zero-shot
	inference see
	[Zero-shot inference notebook in Colab](https://colab.research.google.com/github/google-health/cxr-foundation/blob/master/notebooks/classify_images_with_natural_language.ipynb).

	### Model architecture overview

	The model uses the
	[EfficientNet-L2 architecture](https://arxiv.org/pdf/1911.04252v4.pdf) and
	[BERT architecture](https://arxiv.org/abs/1810.04805). It was trained on 821,544
	CXRs from India and the US using abnormal vs. normal labels, i.e. the image
	contained any kind of abnormality, and the
	[Supervised Contrastive loss](https://arxiv.org/abs/2004.11362v1) as well as
	accompanying radiology reports and the
	[CLIP loss](https://arxiv.org/pdf/2103.00020.pdf) and
	[BLIP-2 losses](https://arxiv.org/abs/2301.12597). The abnormal vs. normal
	labels were obtained from more granular labels (e.g. pneumothorax, fracture) as
	well as
	[regular expressions on radiology reports](https://pubmed.ncbi.nlm.nih.gov/34471144/).

	You can read more about the research behind CXR Foundation in our recent
	publication:
	[Simplified Transfer Learning for Chest Radiography Models Using Less Data.](https://pubs.rsna.org/doi/10.1148/radiol.212482)

	### Technical specifications

	* Model type: Convolutional neural network that produces embeddings
	* Key publications:
	* [Simplified Transfer Learning for Chest Radiography Models Using Less
	Data](https://pubs.rsna.org/doi/10.1148/radiol.212482)
	* [ELIXR: Towards a general purpose X-ray artificial intelligence system
	through alignment of large language models and radiology vision
	encoders](https://arxiv.org/abs/2308.01317)
	* Model created: August 2, 2024
	* Model version: Version: 2.0.0

	### Performance and validation

	CXR Foundation was evaluated across a range of different tasks for
	data-efficient classification, zero-shot classification, semantic image
	retrieval, visual-question answering and report quality assurance.

	### Key performance metrics

	* Data-efficient Classification: Mean AUCs of 0.898 (across atelectasis,
	cardiomegaly, consolidation, pleural effusion, and pulmonary edema) on
	CheXPert test

	* Zero-shot classification: Mean AUC of 0.846 across 13 findings on
	CheXpert test. Findings included: atelectasis, cardiomegaly, consolidation,
	pleural effusion, and pulmonary edema, enlarged cardiomediastinum, pleural
	other, pneumothorax, support devices, airspace opacity, lung lesion,
	pneumonia, and fracture.

	* Semantic image retrieval: **0.76 normalized discounted cumulative gain
	(NDCG) @5** across 19 queries for semantic image retrieval, including
	perfect retrieval on 12 of them.

	* Reference: [ELIXR: Towards a general purpose X-ray artificial intelligence
	system through alignment of large language models and radiology vision
	encoders](https://arxiv.org/pdf/2308.01317)

	### Inputs and outputs

	* Input: Serialized `tf.Example` (with the bytes of a `PNG` written in the
	image/encoded feature key).

	* Output: Embedding (a vector of floating points representing a projection
	of the original image into a compressed feature space)

	## Dataset details

	### Training dataset

	CXR Foundation was trained using the following de-identified datasets:

	* MIMIC-CXR, comprising of 243,324 images of 60,523 unique patients (cited
	below);
	* A private US dataset from an AMC in Illinois comprising of 165,182 images of
	12,988 unique patients; and
	* A private Indian dataset from five hospitals comprising of 485,082 patients
	of 348,335 unique patients

	### Labeling

	Supervised learning was used to label abnormal and normal human data from
	radiology reports.

	A medically tuned LLM, Med-Palm 2 29, was then applied to ensure that the labels
	were consistent with the report, and a board certified thoracic radiologist (CL)
	adjudicated cases where the LLM results differed from the ground truth in
	MIMIC-CXR.

	*Additional information about data and labels used to evaluate CXR Foundation
	for downstream tasks can be found in the following references:*

	- [Sellergren A, Chen C, et al. Simplified Transfer Learning for Chest
	Radiography Models Using Less Data. Radiology.
	2022.](https://pubs.rsna.org/doi/full/10.1148/radiol.212482)
	- [https://pubs.rsna.org/doi/10.1148/radiol.212482](https://pubs.rsna.org/doi/10.1148/radiol.212482)
	(Table 1, 2, 3)
	- [https://github.com/google-research/google-research/tree/master/supcon](https://github.com/google-research/google-research/tree/master/supcon)

	## License

	The use of CXR Foundation is governed by the
	[Health AI Developer Foundations terms of use](https://developers.google.com/health-ai-developer-foundations/terms).

	## Data citation

	- [MIMIC-CXR Johnson, A., Pollard, T., Mark, R., Berkowitz, S., & Horng, S.
	(2024). MIMIC-CXR Database (version 2.1.0).
	PhysioNet.](https://doi.org/10.13026/4jqj-jw95)
	- [Johnson, A.E.W., Pollard, T.J., Berkowitz, S.J. et al. MIMIC-CXR, a
	de-identified publicly available database of chest radiographs with
	free-text reports. Sci Data 6, 317
	(2019).](https://doi.org/10.1038/s41597-019-0322-0)
	- Available on Physionet Goldberger, A., Amaral, L., Glass, L., Hausdorff, J.,
	Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). [PhysioBank,
	PhysioToolkit, and PhysioNet: Components of a new research resource for
	complex physiologic signals. Circulation [Online]. 101 (23), pp.
	e215–e220.](https://pubmed.ncbi.nlm.nih.gov/10851218/)

	## Implementation information

	Details about the model internals.

	### Software

	Training was done using [JAX](https://github.com/google/jax)

	JAX allows researchers to take advantage of the latest generation of hardware,
	including TPUs, for faster and more efficient training of large models.

	## Use and limitations

	### Intended use

	* CXR Foundation can reduce the training data, compute, and technical
	expertise necessary to develop AI applications for radiographs. The model
	has been optimized for chest X-rays, but researchers have reported success
	using it for other types of X-rays, including X-rays of other body parts and
	even veterinary X-rays. Some example applications include:

	#### Data-efficient classification:

	With a low amount of labeled data, you can train a classifier model on top of
	CXR Foundation embeddings (ELIXR v2.0). Furthermore, each embedding can be used
	downstream as an input for a variety of different classifiers, with very little
	additional compute. Below are some example classification tasks:

	* Clinical findings like fracture or pneumothorax
	* Determining X-ray image quality
	* Determining the X-ray view or body part
	* Determining the presence of devices
	* Discovering misplaced tubes

	#### Zero-shot classification

	By using the contrastive mode (ELIXR-contrastive / v2.0 text), users can get a
	classification score without any additional training data through
	textual prompts. Zero-shot works by measuring the relative distance of the image
	embeddings from a positive e.g., "pleural effusion present", and negative text
	prompt e.g., "normal X-ray". The use cases are the same as data-efficient
	classification but don't require data to train. The zero-shot method will
	outperform data-efficient classifications at low levels of training data, while
	the data-efficient classification will tend to exceed zero-shot performance with
	larger amounts of data. See [ELIXR paper](https://arxiv.org/pdf/2308.01317) for
	more details.

	#### Semantic image retrieval

	By using the contrastive mode (ELIXR-contrastive / v2.0 text) users can rank a
	set of X-rays across a search query. Similar to Zero-shot classification,
	language-based image retrieval relies on the distance between the embeddings of
	the set of images and the text embeddings from the search query.

	### Benefits

	* CXR Foundation Embeddings can be used for efficient training of AI
	development for chest X-ray image analysis with significantly less data and
	compute than traditional methods.

	* By leveraging the large set of pre-trained images CXR Foundation is trained
	on, users need less data but can also build more generalizable models than
	training on more limited datasets.

	### Limitations

	The following are known factors that might limit the generalizability or
	usefulness of the model output for application in downstream tasks:

	* The model was trained using only de-identified data from the US and India
	and may not generalize well to data from other countries, patient
	populations, or manufacturers not used in training.

	* The model has only been validated for a limited number of the many potential
	downstream tasks involving chest radiographs.

	* Image quality and min resolution. 1024x1024 recommended.

	* The model is only used to generate embeddings of user-provided data. It does
	not generate any predictions or diagnosis on its own.

	* Task-specific validation remains an important aspect of downstream model
	development by the end user.

	* As with any research, developers should ensure that any downstream
	application is validated to understand performance using data that is
	appropriately representative of the intended use setting for the specific
	application (e.g., age, sex, gender, condition, scanner, etc.).