Instructions to use Ztrura/SEER with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Ztrura/SEER with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="Ztrura/SEER")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Ztrura/SEER", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Ztrura/SEER with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Ztrura/SEER" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Ztrura/SEER", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Ztrura/SEER
- SGLang
How to use Ztrura/SEER with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Ztrura/SEER" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Ztrura/SEER", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Ztrura/SEER" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Ztrura/SEER", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Ztrura/SEER with Docker Model Runner:
docker model run hf.co/Ztrura/SEER
SEER: Skill-Evolving Grounded Reasoning for Free-Text Promptable 3D Medical Image Segmentation
π Model & Dataset Description
SEER is a vision-language reasoning model designed for robust free-text promptable 3D medical image segmentation. It grounds clinical language in image evidence, evolves reusable reasoning skills, and produces an executable target specification for a downstream segmentation backbone.
The model follows a structured SEER reasoning format:
<evidence>
Image-grounded observations about the visible anatomical or pathological target.
</evidence>
<rationale>
The selected reasoning skill, if a skill bank is provided, followed by the reasoning process that maps the raw clinical request and image evidence to a normalized target specification.
</rationale>
<answer>
The executable target specification for downstream segmentation backbone.
</answer>
SEER-Trace is the grounded reasoning dataset used to train the SEER model. It is curated from established 3D medical segmentation benchmarks and augments each case with clinician-like free-text requests and structured reasoning traces.
The released SEER-Trace split is intended for evaluation and covers two settings:
| Dataset | Modality | Evaluation Role |
|---|---|---|
| BrainMetShare | MRI | Partial-OOD / domain-shift evaluation: brain anatomy is within the seen anatomical domain, while the institutional sources and target labels are outside the training coverage. |
| PENGWIN | CT | Strict OOD evaluation: both the pelvic anatomy and pelvic bone target labels are absent from SEER-Trace reasoning supervision and target coverage. |
β¨ Key Features
- Free-text clinical request robustness: Handle linguistic variations such as synonyms, abbreviations, and high-level clinical intent descriptions
- Multi-modality support: Works across CT and MRI imaging modalities
- Image-grounded reasoning: Identifies image-grounded evidence and uses it to resolve the clinical request
- Evolving reasoning skills: Distills high-reward reasoning traces into reusable skills and continuously updates the skill bank according to each skillβs utility
- Backbone-independent gains: Shows consistent robustness improvements across different downstream 3D segmentation backbones
π§© Versions
We release multiple SEER versions (continuously updated) to enable both reproducible research and high-performance downstream applications.
SEER v1.1 (Recommended)
- Info: Recommended default version
- Contents: SEER-Trace v1.1 and corresponding model weights (LoRA weights)
- Training Scale: Trained on all datasets from the paper and additional sources (~33,714 traces in total)
- Fine-tuning: LoRA fine-tuning, enabling efficient adaptation while preserving the general capabilities of the Qwen3-VL backbone
- Use Case: Recommended for general inference, downstream integration. This version maximizes supervision and concept coverage for stronger general-purpose performance
SEER v1.0 (Deprecated)
- Info: This version was used for the experiments in the paper but contains known issues that have been fixed in v1.1. It is not recommended for general use.
- Contents: SEER-Trace v1.0 and corresponding model weights (full weights)
- Training Scale: Trained on original datasets (~22,330 traces in total)
- Fine-tuning: Full-parameter fine-tuning
- Use Case: Reproducibility of the results reported in the paper
β οΈ Usage Instructions
This release contains only the VLM reasoning weights. 3D segmentation backbones, such as VoxTell or MedSAM3, should be integrated separately.
Please refer to our official GitHub repository for detailed instructions on environment setup, weight loading, and inference.
- GitHub Repository: SEER on GitHub
- Paper: ArXiv
π©Ί Ethical Considerations
Medical image models can produce plausible but incorrect explanations. Users should treat outputs as research results, not clinical conclusions. Do not use this model to replace professional medical judgment.
π Citation
@InProceedings{zhang2026seer,
author = { Zhang, Tongrui and Wang, Chenhui and Li, Yongming and Chen, Zhihao and Zhan, Xufeng and Shan, Hongming},
title = { Skill-Evolving Grounded Reasoning for Free-Text Promptable 3D Medical Image Segmentation },
booktitle = { Medical Image Computingand Computer Assisted Intervention },
year = { 2026 }
}
- Downloads last month
- 20
Model tree for Ztrura/SEER
Base model
Qwen/Qwen3-VL-4B-Instruct