MedFuse-Seg: Multi-Level Visual and Semantic Context Fusion for Segmentation-Based Medical Reasoning (MICCAI 2026)

MedFuse-Seg bridges the semantic-spatial gap in language-driven medical image analysis by combining multi-level visual feature injection with LLM-guided mask decoding. Built on MedGemma-4B, MedSigLIP, and MedSAM, the model allows clinicians to obtain both diagnostic reasoning and precise anatomical segmentation through natural language prompts.

For full training details and hyperparameters, see the project repository and Med-ReasonSeg dataset.

Model Performance

MedFuse-Seg outperforms zero-shot BiomedParse by 13.49% DSC and 54.04 px HD95, and fine-tuned LISA-7B (same training setup) by 4.89% DSC and 15.29 px HD95.

Method	DSC (Ref)	DSC (Sem)	DSC (Avg)	HD95 (Ref)	HD95 (Sem)	HD95 (Avg)
SAM 3 (zero-shot)	0.1425	0.1167	0.1296	373.52	370.50	372.01
BiomedParse (zero-shot)	0.6703	0.6344	0.6524	105.23	115.97	110.60
LISA-7B (fine-tuned)	0.7398	0.7370	0.7384	71.55	72.13	71.84
MedFuse-Seg (Ours)	0.7879	0.7867	0.7873	56.46	56.65	56.55

Evaluated on the Med-ReasonSeg test set. LISA-7B was retrained with identical training setup for a fair comparison.

Download & Use

1. Install dependencies

git clone https://github.com/biodatlab/medfuse-seg.git
cd medfuse-seg
pip install -r requirements.txt

2. Download MedSAM checkpoint (required)

Download from the original MedSAM paper's repository:

gdown "https://drive.google.com/uc?id=1UAmWL88roYR7wKlnApw5Bcuzf2iQgk6_"

Place medsam_vit_b.pth in the repository root.

3. Download checkpoint and run inference

from huggingface_hub import hf_hub_download
from medfuseseg import MedFuseSegPipeline

hf_hub_download(repo_id="biodatlab/medfuse-seg", local_dir="ckpts", repo_type="model")

pipe = MedFuseSegPipeline(checkpoint="ckpts")

result = pipe(
    image="chest_xray.png",  # filepath, URL, PIL Image, or numpy array
    prompt="Segment the pneumonia region"
)

print(result.text)        # "The affected lung parenchyma is [SEG]..."
result.save_mask("mask.png")
result.save_overlay("vis.png")

MedGemma-4B-IT will be downloaded automatically from HuggingFace Hub on first run.

Citation

@inproceedings{LimKee_MedFuseSeg_MICCAI2026,
  title={MedFuse-Seg: Multi-Level Visual and Semantic Context Fusion for Segmentation-Based Medical Reasoning},
  author={Limaroon, Keetawan and Chiewhawan, Monrada and Timklaypachara, Watcharapong and Vateekul, Peerapon and Achakulvisut, Titipat},
  booktitle = {Proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2026},
  year={2026}
}

License

Apache-2.0

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for biodatlab/medfuse-seg

Base model

google/gemma-3-4b-pt

Finetuned

google/medgemma-4b-pt

Finetuned

google/medgemma-4b-it

Finetuned

(617)

this model

biodatlab
/

medfuse-seg

MedFuse-Seg: Multi-Level Visual and Semantic Context Fusion for Segmentation-Based Medical Reasoning (MICCAI 2026)

Model Performance

Download & Use

1. Install dependencies

2. Download MedSAM checkpoint (required)

3. Download checkpoint and run inference

Citation

License

Model tree for biodatlab/medfuse-seg

Dataset used to train biodatlab/medfuse-seg