BLIP โ€” Chest X-Ray Captioning (MIMIC-CXR Fine-Tune)

A fine-tuned version of Salesforce/blip-image-captioning-base on the MIMIC-CXR dataset for automatic chest X-ray report generation.

Dataset

  • Source: itsanmolgupta/mimic-cxr-dataset
  • Train split: 19,600 samples (indices 9,000 โ€“ 28,600)
  • Validation split: 2,000 samples (indices 28,600 โ€“ 30,600)

Usage

from transformers import AutoProcessor, AutoModelForImageTextToText
from PIL import Image
import torch

repo_id  = "AliFadel/blip-cxr-mimic-finetuned"
processor = AutoProcessor.from_pretrained(repo_id)
model     = AutoModelForImageTextToText.from_pretrained(repo_id)
model.eval()

image = Image.open("chest_xray.jpg").convert("RGB")
inputs = processor(images=image, text="Chest X-Ray", return_tensors="pt")

with torch.no_grad():
    ids = model.generate(
        **inputs,
        num_beams=5,
        max_length=150,
        repetition_penalty=1.5,
        top_p=0.95,
    )

caption = processor.batch_decode(ids, skip_special_tokens=True)[0]
print(caption)

Intended Use

This model is intended for research purposes only. It should not be used as a clinical diagnostic tool without expert medical supervision.

Downloads last month
41
Safetensors
Model size
0.2B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for AliFadel/blip-cxr-mimic-finetuned

Finetuned
(51)
this model