LAPVQA — Radiology Report Generation (Native / End-to-end)

Description

RRG decoders trained end-to-end alongside their vision encoders. Each checkpoint is a dict: {state_dict, vis_dim, d_model, num_layers, nhead, encoder, epoch, val_bleu4}.

File	Encoder	vis_dim
`clip-vit-l14.pt`	CLIP ViT-L/14 (fine-tuned)	1024
`siglip.pt`	SigLIP (fine-tuned)	1152
`florence2.pt`	Florence-2 (fine-tuned)	1024
`coca.pt`	CoCa (fine-tuned)	768
`mae-vit-l16.pt`	MAE ViT-L/16 (fine-tuned)	1024

Results (MIMIC-CXR test set, MAE-ViT-L/16)

BLEU-4	ROUGE-L	RadGraph-s
0.032	0.164	0.195

Loading

import torch
from lapvqa.rrg.heads import ReportGenerationHead

ckpt = torch.load("mae-vit-l16.pt", map_location="cpu")
head = ReportGenerationHead(
    vis_dim    = ckpt["vis_dim"],
    d_model    = ckpt["d_model"],
    num_layers = ckpt["num_layers"],
    nhead      = ckpt["nhead"],
)
head.load_state_dict(ckpt["state_dict"])
head.eval()

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including dmusingu/lapvqa-rrg-native

LAPVQA

Collection

Chest X-ray models: pre-trained encoders and task heads for VQA, DiffVQA, RRG, detection, and grounding on MIMIC-CXR. • 14 items • Updated 3 days ago