MedFuse-Seg: Multi-Level Visual and Semantic Context Fusion for Segmentation-Based Medical Reasoning (MICCAI 2026)
MedFuse-Seg bridges the semantic-spatial gap in language-driven medical image analysis by combining multi-level visual feature injection with LLM-guided mask decoding. Built on MedGemma-4B, MedSigLIP, and MedSAM, the model allows clinicians to obtain both diagnostic reasoning and precise anatomical segmentation through natural language prompts.
For full training details and hyperparameters, see the project repository and Med-ReasonSeg dataset.
Model Performance
MedFuse-Seg outperforms zero-shot BiomedParse by 13.49% DSC and 54.04 px HD95, and fine-tuned LISA-7B (same training setup) by 4.89% DSC and 15.29 px HD95.
| Method | DSC (Ref) | DSC (Sem) | DSC (Avg) | HD95 (Ref) | HD95 (Sem) | HD95 (Avg) |
|---|---|---|---|---|---|---|
| SAM 3 (zero-shot) | 0.1425 | 0.1167 | 0.1296 | 373.52 | 370.50 | 372.01 |
| BiomedParse (zero-shot) | 0.6703 | 0.6344 | 0.6524 | 105.23 | 115.97 | 110.60 |
| LISA-7B (fine-tuned) | 0.7398 | 0.7370 | 0.7384 | 71.55 | 72.13 | 71.84 |
| MedFuse-Seg (Ours) | 0.7879 | 0.7867 | 0.7873 | 56.46 | 56.65 | 56.55 |
Evaluated on the Med-ReasonSeg test set. LISA-7B was retrained with identical training setup for a fair comparison.
Download & Use
1. Install dependencies
git clone https://github.com/biodatlab/medfuse-seg.git
cd medfuse-seg
pip install -r requirements.txt
2. Download MedSAM checkpoint (required)
Download from the original MedSAM paper's repository:
gdown "https://drive.google.com/uc?id=1UAmWL88roYR7wKlnApw5Bcuzf2iQgk6_"
Place medsam_vit_b.pth in the repository root.
3. Download checkpoint and run inference
from huggingface_hub import hf_hub_download
from medfuseseg import MedFuseSegPipeline
hf_hub_download(repo_id="biodatlab/medfuse-seg", local_dir="ckpts", repo_type="model")
pipe = MedFuseSegPipeline(checkpoint="ckpts")
result = pipe(
image="chest_xray.png", # filepath, URL, PIL Image, or numpy array
prompt="Segment the pneumonia region"
)
print(result.text) # "The affected lung parenchyma is [SEG]..."
result.save_mask("mask.png")
result.save_overlay("vis.png")
MedGemma-4B-IT will be downloaded automatically from HuggingFace Hub on first run.
Citation
@inproceedings{LimKee_MedFuseSeg_MICCAI2026,
title={MedFuse-Seg: Multi-Level Visual and Semantic Context Fusion for Segmentation-Based Medical Reasoning},
author={Limaroon, Keetawan and Chiewhawan, Monrada and Timklaypachara, Watcharapong and Vateekul, Peerapon and Achakulvisut, Titipat},
booktitle = {Proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2026},
year={2026}
}
License
Apache-2.0