CheXmix: Unified Generative Pretraining for Vision Language Models in Medical Imaging
Paper β’ 2604.22989 β’ Published
CheXmix is a unified early-fusion generative model trained on a large corpus of chest X-rays paired with radiology reports. The huggingface repository here provides the model weights and an example file (CVPR findings 2026).
For an editable installation, use the following commands to clone and install this repository.
git clone https://github.com/StanfordMIMI/CheXmix.git
cd CheXmix
pip install -e .
For usage instructions, please visit the github repository.
.
βββ README.md
βββ model.safetensors <CheXmix (S1 + S2) checkpoint>
βββ vqgan.ckpt <Image Tokenizer checkpoint>
If you find this repository useful for your work, please cite the cite the paper:
@inproceedings{kumar2026chexmix,
author = {Kumar, Ashwin and Holland, Robbie and Barrett, Corey and Kim, Jangwon and Varma, Maya and Chen, Zhihong and Gao, Yunhe and Zaharchuk, Greg and Taghavi, Tara and Kenthapadi, Krishnaram and Chaudhari, Akshay},
title = {CheXmix: Unified Generative Pretraining for Vision Language Models in Medical Imaging},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Findings},
pages = {9466--9476},
year = {2026},
note = {arXiv preprint arXiv:2604.22989}
}