MedFuse-Seg: Multi-Level Visual and Semantic Context Fusion for Segmentation-Based Medical Reasoning (MICCAI 2026)

Dataset Code

MedFuse-Seg bridges the semantic-spatial gap in language-driven medical image analysis by combining multi-level visual feature injection with LLM-guided mask decoding. Built on MedGemma-4B, MedSigLIP, and MedSAM, the model allows clinicians to obtain both diagnostic reasoning and precise anatomical segmentation through natural language prompts.

For full training details and hyperparameters, see the project repository and Med-ReasonSeg dataset.

Model Performance

MedFuse-Seg outperforms zero-shot BiomedParse by 13.49% DSC and 54.04 px HD95, and fine-tuned LISA-7B (same training setup) by 4.89% DSC and 15.29 px HD95.

Method DSC (Ref) DSC (Sem) DSC (Avg) HD95 (Ref) HD95 (Sem) HD95 (Avg)
SAM 3 (zero-shot) 0.1425 0.1167 0.1296 373.52 370.50 372.01
BiomedParse (zero-shot) 0.6703 0.6344 0.6524 105.23 115.97 110.60
LISA-7B (fine-tuned) 0.7398 0.7370 0.7384 71.55 72.13 71.84
MedFuse-Seg (Ours) 0.7879 0.7867 0.7873 56.46 56.65 56.55

Evaluated on the Med-ReasonSeg test set. LISA-7B was retrained with identical training setup for a fair comparison.

Download & Use

1. Install dependencies

git clone https://github.com/biodatlab/medfuse-seg.git
cd medfuse-seg
pip install -r requirements.txt

2. Download MedSAM checkpoint (required)

Download from the original MedSAM paper's repository:

gdown "https://drive.google.com/uc?id=1UAmWL88roYR7wKlnApw5Bcuzf2iQgk6_"

Place medsam_vit_b.pth in the repository root.

3. Download checkpoint and run inference

from huggingface_hub import hf_hub_download
from medfuseseg import MedFuseSegPipeline

hf_hub_download(repo_id="biodatlab/medfuse-seg", local_dir="ckpts", repo_type="model")

pipe = MedFuseSegPipeline(checkpoint="ckpts")

result = pipe(
    image="chest_xray.png",  # filepath, URL, PIL Image, or numpy array
    prompt="Segment the pneumonia region"
)

print(result.text)        # "The affected lung parenchyma is [SEG]..."
result.save_mask("mask.png")
result.save_overlay("vis.png")

MedGemma-4B-IT will be downloaded automatically from HuggingFace Hub on first run.

Citation

@inproceedings{LimKee_MedFuseSeg_MICCAI2026,
  title={MedFuse-Seg: Multi-Level Visual and Semantic Context Fusion for Segmentation-Based Medical Reasoning},
  author={Limaroon, Keetawan and Chiewhawan, Monrada and Timklaypachara, Watcharapong and Vateekul, Peerapon and Achakulvisut, Titipat},
  booktitle = {Proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2026},
  year={2026}
}

License

Apache-2.0

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for biodatlab/medfuse-seg

Finetuned
(617)
this model

Dataset used to train biodatlab/medfuse-seg