--- tags: - chest-xray - radiology - visual-question-answering - differential-vqa - mimic-cxr license: apache-2.0 --- # LAPVQA — Differential VQA (Native / End-to-end) Part of the [LAPVQA collection](https://huggingface.co/collections/dmusingu/lapvqa). ## Description DiffVQA models trained **end-to-end** (encoder + head jointly). Each `.pt` file is a plain state dict of `DiffVQAHead`. MAE-ViT-L/16 is the primary encoder studied. ## Results (test set, MAE-ViT-L/16) | BLEU-4 | ROUGE-2 | RadGraph-s | BERTScore F1 | |---|---|---|---| | 0.472 | 0.573 | 0.288 | 0.938 | | File | Encoder | vis_dim | |---|---|---| | `clip-vit-l14_best.pt` | CLIP ViT-L/14 | 1024 | | `coca_best.pt` | CoCa | 768 | | `florence2_best.pt` | Florence-2 | 1024 | | `mae-vit-l16_best.pt` | MAE ViT-L/16 | 1024 | | `siglip_best.pt` | SigLIP | 1152 | ## Loading ```python import torch from lapvqa.diffvqa.model import DiffVQAHead ckpt = torch.load("mae-vit-l16_best.pt", map_location="cpu") head = DiffVQAHead(vis_dim=1024) head.load_state_dict(ckpt) head.eval() ```