--- license: mit language: - en tags: - medical - vision widget: - src: "https://d168r5mdg5gtkq.cloudfront.net/medpix/img/full/synpic9078.jpg" candidate_labels: "Chest X-Ray, Brain MRI, Abdomen CT Scan" example_title: "Abdomen CT Scan" --- # Model Card for PubMedCLIP PubMedCLIP is a fine-tuned version of [CLIP](https://huggingface.co/docs/transformers/model_doc/clip) for the medical domain. ## Model Description PubMedCLIP was trained on the [Radiology Objects in COntext (ROCO)](https://github.com/razorx89/roco-dataset) dataset, a large-scale multimodal medical imaging dataset. The ROCO dataset includes diverse imaging modalities (such as X-Ray, MRI, ultrasound, fluoroscopy, etc.) from various human body regions (such as head, spine, chest, abdomen, etc.) captured from open-access [PubMed](https://pubmed.ncbi.nlm.nih.gov/) articles.
The authors of PubMedCLIP have released three different pre-trained models at this [link](https://1drv.ms/u/s!ApXgPqe9kykTgwD4Np3-f7ODAot8?e=zLVlJ2) which use ResNet-50, ResNet-50x4 and ViT32 as image encoders. This repository includes only the ViT32 variant of the PubMedCLIP model.
- **Repository:** [PubMedCLIP Official GitHub Repository](https://github.com/sarahESL/PubMedCLIP) - **Paper:** [Does CLIP Benefit Visual Question Answering in the Medical Domain as Much as it Does in the General Domain?](https://arxiv.org/abs/2112.13906) ## Use with Transformers ```python import requests from PIL import Image from transformers import CLIPProcessor, CLIPModel model = CLIPModel.from_pretrained("flaviagiammarino/pubmed-clip-vit-base-patch32") processor = CLIPProcessor.from_pretrained("flaviagiammarino/pubmed-clip-vit-base-patch32") url = "https://d168r5mdg5gtkq.cloudfront.net/medpix/img/full/synpic9078.jpg" image = Image.open(requests.get(url, stream=True).raw) inputs = processor(text=["Chest X-Ray", "Brain MRI", "Abdominal CT Scan"], images=image, return_tensors="pt", padding=True) outputs = model(**inputs) logits_per_image = outputs.logits_per_image # this is the image-text similarity score probs = logits_per_image.softmax(dim=1) # we can take the softmax to get the label probabilities ``` ## Additional Information ### Licensing Information The authors have released the model code and pre-trained checkpoints under the [MIT License](https://github.com/sarahESL/PubMedCLIP/blob/main/LICENSE). ### Citation Information ``` @article{eslami2021does, title={Does clip benefit visual question answering in the medical domain as much as it does in the general domain?}, author={Eslami, Sedigheh and de Melo, Gerard and Meinel, Christoph}, journal={arXiv preprint arXiv:2112.13906}, year={2021} } ```