Model Card for KEEP

Preprint | Github | Webpage | Cite

KEEP (KnowledgE-Enhanced Pathology) is a foundation model designed for cancer diagnosis that integrates disease knowledge into vision-language pre-training. It utilizes a comprehensive disease knowledge graph (KG) containing 11,454 human diseases and 139,143 disease attributes, such as synonyms, definitions, and hierarchical relationships. KEEP reorganizes millions of publicly available noisy pathology image-text pairs into 143K well-structured semantic groups based on the hierarchical relations of the disease KG. By incorporating disease knowledge into the alignment process, KEEP achieves more nuanced image and text representations. The model is validated on 18 diverse benchmarks with over 14,000 whole-slide images (WSIs), demonstrating state-of-the-art performance in zero-shot cancer diagnosis, including an average sensitivity of 89.8% for cancer detection across 7 cancer types. KEEP also excels in subtyping rare cancers, achieving strong generalizability in diagnosing rare tumor subtypes.

Model Details

Model Description

Developed by: MAGIC-AI4Med team from Shanghai Jiao Tong University and Shanghai AI Lab.
Model type: Vision-language models (vision encoder: ViT-L/16; text encoder: Bert)
Pretrain datasets: 143K pathology semantic groups, each with a single caption and multiple images.
License: MIT

Model Sources [optional]

Repository: https://github.com/MAGIC-AI4Med/KEEP
Paper [optional]: https://arxiv.org/abs/2412.13126
Demo [optional]: [More Information Needed]

Direct Use

from transformers import AutoModel, AutoTokenizer
from torchvision import transforms
from PIL import Image

model = AutoModel.from_pretrained("Astaxanthin/KEEP", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("Astaxanthin/KEEP", trust_remote_code=True)
model.eval()
transform = transforms.Compose([
    transforms.Resize(size=224, interpolation=transforms.InterpolationMode.BICUBIC),
    transforms.CenterCrop(size=(224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))
])

example_image_path = './example.tif'
example_text = ['an H&E image of breast invasive carcinoma.', 'an H&E image of normal tissue.', 'an H&E image of lung adenocarcinoma.']

img_input =  transform(Image.open(example_image_path).convert('RGB')).unsqueeze(0)
token_input = tokenizer(example_text,max_length=256,padding='max_length',truncation=True, return_tensors='pt')

img_feature = model.encode_image(img_input)
text_feature = model.encode_text(token_input)

Evaluation

Testing Data

We present benchmark results for a range of representative tasks. A complete set of benchmarks can be found in the paper. These results will be updated with each new iteration of KEEP.

Results

Zero-shot Cancer Region Segmentation (DICE)

Models	PLIP[1]	QuiltNet [2]	MI-Zero (Pub) [3]	CONCH [4]	KEEP(Ours)
CAMELYON16	0.253	0.157	0.186	0.292	0.361
PANDA	0.295	0.309	0.276	0.315	0.334
AGGC22	0.284	0.282	0.324	0.449	0.530

Zero-shot Cancer Detection (AUROC)

Models	CHIEF[1]	PLIP [2]	QuiltNet [3]	MI-Zero (Pub) [4]	CONCH [5]	KEEP(Ours)
CPTAC-CM	0.915	0.970	0.972	0.985	0.994	0.994
CPTAC-CCRCC	0.723	0.330	0.755	0.886	0.871	0.999
CPTAC-PDA	0.825	0.391	0.464	0.796	0.920	0.929
CPTAC-UCEC	0.955	0.945	0.973	0.979	0.996	0.998
CPTAC-LSCC	0.901	0.965	0.966	0.910	0.987	0.983
CPTAC-HNSCC	0.946	0.898	0.874	0.918	0.982	0.976
CPTAC-LUAD	0.891	0.988	0.991	0.981	0.999	1.000

Zero-shot Cancer Subtyping (BACC)

Models	PLIP [1]	QuiltNet [2]	MI-Zero (Pub) [3]	CONCH [4]	KEEP(Ours)
TCGA-BRCA	0.519	0.500	0.633	0.727	0.774
TCGA-NSCLC	0.699	0.667	0.753	0.901	0.902
TCGA-RCC	0.735	0.755	0.908	0.921	0.926
TCGA-ESCA	0.614	0.746	0.954	0.923	0.977
TCGA-BRAIN	0.361	0.346	0.361	0.453	0.604
UBC-OCEAN	0.343	0.469	0.652	0.674	0.661
CPTAC-NSCLC	0.647	0.607	0.643	0.836	0.863
EBRAINS	0.096	0.093	0.325	0.371	0.456

Summary

Validated on 18 diverse benchmarks with more than 14,000 whole slide images (WSIs), KEEP achieves state-of-the-art performance in zero-shot cancer diagnostic tasks. Notably, for cancer detection, KEEP demonstrates an average sensitivity of 89.8% at a specificity of 95.0% across 7 cancer types, significantly outperforming vision-only foundation models and highlighting its promising potential for clinical application. For cancer subtyping, KEEP achieves a median balanced accuracy of 0.456 in subtyping 30 rare brain cancers, indicating strong generalizability for diagnosing rare tumors.

Citation

@article{zhou2024keep,
  title={A Knowledge-enhanced Pathology Vision-language Foundation Model for Cancer Diagnosis},
  author={Xiao Zhou, Luoyi Sun, Dexuan He, Wenbin Guan, Ruifen Wang, Lifeng Wang, Xin Sun, Kun Sun, Ya Zhang, Yanfeng Wang, Weidi Xie},
  journal={arXiv preprint arXiv:2412.13126},
  year={2024}
}