LAPVQA
Collection
Chest X-ray models: pre-trained encoders and task heads for VQA, DiffVQA, RRG, detection, and grounding on MIMIC-CXR. โข 14 items โข Updated
Part of the LAPVQA collection.
RRG decoders trained end-to-end alongside their vision encoders.
Each checkpoint is a dict: {state_dict, vis_dim, d_model, num_layers, nhead, encoder, epoch, val_bleu4}.
| File | Encoder | vis_dim |
|---|---|---|
clip-vit-l14.pt |
CLIP ViT-L/14 (fine-tuned) | 1024 |
siglip.pt |
SigLIP (fine-tuned) | 1152 |
florence2.pt |
Florence-2 (fine-tuned) | 1024 |
coca.pt |
CoCa (fine-tuned) | 768 |
mae-vit-l16.pt |
MAE ViT-L/16 (fine-tuned) | 1024 |
| BLEU-4 | ROUGE-L | RadGraph-s |
|---|---|---|
| 0.032 | 0.164 | 0.195 |
import torch
from lapvqa.rrg.heads import ReportGenerationHead
ckpt = torch.load("mae-vit-l16.pt", map_location="cpu")
head = ReportGenerationHead(
vis_dim = ckpt["vis_dim"],
d_model = ckpt["d_model"],
num_layers = ckpt["num_layers"],
nhead = ckpt["nhead"],
)
head.load_state_dict(ckpt["state_dict"])
head.eval()