File size: 1,052 Bytes
3b87217 217640a 3b87217 217640a | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | ---
tags:
- chest-xray
- radiology
- visual-question-answering
- differential-vqa
- mimic-cxr
license: apache-2.0
---
# LAPVQA — Differential VQA (Native / End-to-end)
Part of the [LAPVQA collection](https://huggingface.co/collections/dmusingu/lapvqa).
## Description
DiffVQA models trained **end-to-end** (encoder + head jointly). Each `.pt` file
is a plain state dict of `DiffVQAHead`. MAE-ViT-L/16 is the primary encoder studied.
## Results (test set, MAE-ViT-L/16)
| BLEU-4 | ROUGE-2 | RadGraph-s | BERTScore F1 |
|---|---|---|---|
| 0.472 | 0.573 | 0.288 | 0.938 |
| File | Encoder | vis_dim |
|---|---|---|
| `clip-vit-l14_best.pt` | CLIP ViT-L/14 | 1024 |
| `coca_best.pt` | CoCa | 768 |
| `florence2_best.pt` | Florence-2 | 1024 |
| `mae-vit-l16_best.pt` | MAE ViT-L/16 | 1024 |
| `siglip_best.pt` | SigLIP | 1152 |
## Loading
```python
import torch
from lapvqa.diffvqa.model import DiffVQAHead
ckpt = torch.load("mae-vit-l16_best.pt", map_location="cpu")
head = DiffVQAHead(vis_dim=1024)
head.load_state_dict(ckpt)
head.eval()
```
|