Instructions to use UCSC-VLAA/ClinSeek-35B-A3B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use UCSC-VLAA/ClinSeek-35B-A3B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="UCSC-VLAA/ClinSeek-35B-A3B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("UCSC-VLAA/ClinSeek-35B-A3B")
model = AutoModelForImageTextToText.from_pretrained("UCSC-VLAA/ClinSeek-35B-A3B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use UCSC-VLAA/ClinSeek-35B-A3B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "UCSC-VLAA/ClinSeek-35B-A3B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "UCSC-VLAA/ClinSeek-35B-A3B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/UCSC-VLAA/ClinSeek-35B-A3B

SGLang

How to use UCSC-VLAA/ClinSeek-35B-A3B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "UCSC-VLAA/ClinSeek-35B-A3B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "UCSC-VLAA/ClinSeek-35B-A3B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "UCSC-VLAA/ClinSeek-35B-A3B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "UCSC-VLAA/ClinSeek-35B-A3B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use UCSC-VLAA/ClinSeek-35B-A3B with Docker Model Runner:
```
docker model run hf.co/UCSC-VLAA/ClinSeek-35B-A3B
```

ClinSeek-35B-A3B

ClinSeek-35B-A3B is our open-source model for ClinSeekAgent: Automating Multimodal Evidence Seeking for Agentic Clinical Reasoning. We trained it by supervised fine-tuning from Qwen/Qwen3.5-35B-A3B on ClinSeekAgent trajectories generated by Claude Opus 4.6.

ClinSeekAgent studies a clinical reasoning setting where evidence is not handed to the model in a pre-curated prompt. Instead, an agent must actively retrieve patient-specific evidence from raw EHR tables, consult external medical knowledge when needed, and synthesize the acquired evidence into a final decision. ClinSeek-35B-A3B is trained to imitate this long-horizon evidence seeking behavior in native tool-call format.

ClinSeek-35B-A3B performance on AgentEHR-Bench

Release Information

Item	Value
Model	`ClinSeek-35B-A3B`
Base model	`Qwen/Qwen3.5-35B-A3B`
Training method	Supervised fine-tuning
Teacher model	Claude Opus 4.6
Training signal	ClinSeekAgent evidence-seeking trajectories
Primary target setting	Agentic EHR evidence seeking
Technical report	https://arxiv.org/abs/2605.20176
Code	https://github.com/UCSC-VLAA/ClinSeekAgent
Benchmark metadata	https://huggingface.co/datasets/UCSC-VLAA/ClinSeek-Bench
Project page	https://ucsc-vlaa.github.io/ClinSeekAgent/

Training Data And Objective

ClinSeek-35B-A3B validates ClinSeekAgent as a training-time pipeline. Claude Opus 4.6 is used as the teacher model to generate ClinSeekAgent trajectories from the training split of the text-based benchmark. The student model is then fine-tuned with supervised learning on the resulting trajectories.

The trajectories are rendered in native tool-call format with <tool_call> / <tool_response> turns, teaching the model how to search the EHR rather than only imitate final answers.

Training configuration:

Component	Configuration
Base model	Qwen3.5-35B-A3B
Training objective	SFT on ClinSeekAgent trajectories
Training / validation size	7,204 / 147 examples
Maximum sequence length	52,000 tokens
Training epochs	3
Global batch size	32
Micro batch size	1 per GPU
Optimizer	Megatron optimizer with CPU offload
Learning rate	2e-5
Minimum learning rate	2e-6
Learning rate schedule	Cosine decay with 10 warmup steps
Weight decay	0.1
Gradient clipping	1.0
Precision	bfloat16
Backend	Megatron + mbridge
Hardware	8 H200 GPUs
Tensor / expert / pipeline parallelism	TP=2, EP=8, PP=1
Random seed	42

This release contains the model weights and tokenizer files. It does not redistribute protected clinical source data, patient-level databases, private trajectories, experiment logs, or raw MIMIC-derived records.

Evaluation

We evaluate ClinSeek-35B-A3B on the five-task AgentEHR-Bench setting. The model improves the Qwen3.5-35B-A3B base model from 22.1 to 34.0 average F1, a +11.9 point gain, and achieves the strongest open-source performance among the evaluated models.

Model	Diagnoses	Labs	Microbiology	Procedures	Transfers	Avg.
Qwen3.5-35B-A3B (base)	36.6	17.7	16.2	21.9	18.1	22.1
ClinSeek-35B-A3B	55.4	38.5	27.6	31.7	16.7	34.0
Delta	+18.8	+20.8	+11.4	+9.8	-1.4	+11.9

Our analysis shows that the distilled model learns a different tool-use policy, not just a different final-answer prior. On the same 500 AgentEHR-Bench questions, its free-form SQL use increases from 649 calls in the base model to 3,932 calls after SFT, suggesting that ClinSeekAgent trajectories teach the student to treat the EHR as a programmable database.

For full evaluation scripts and benchmark reconstruction instructions, see: https://github.com/UCSC-VLAA/ClinSeekAgent.

Usage

Use the checkpoint with a recent transformers release that supports Qwen3.5-MoE models. For the evaluation setting used in this work, serve the model with an OpenAI-compatible backend such as vLLM and run the ClinSeekAgent evaluation drivers.

Basic loading example:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "UCSC-VLAA/ClinSeek-35B-A3B"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

messages = [
    {
        "role": "system",
        "content": "You are a clinical evidence-seeking assistant.",
    },
    {
        "role": "user",
        "content": "Answer the clinical question using the available evidence.",
    },
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

with torch.no_grad():
    output_ids = model.generate(**inputs, max_new_tokens=512)

print(tokenizer.decode(output_ids[0], skip_special_tokens=True))

For tool-using evaluation, use the ClinSeekAgent repository rather than a single-turn text generation script. The repository provides the EHR MCP server, tool schemas, prompts, and scoring code expected by this model.

Citation

Please cite our ClinSeekAgent technical report if you use this model:

@article{clinseekagent2026,
  title = {ClinSeekAgent: Automating Multimodal Evidence Seeking for Agentic Clinical Reasoning},
  year = {2026},
  url = {https://arxiv.org/abs/2605.20176}
}

Also cite the upstream datasets, benchmarks, and base models used in your experiments, including MIMIC, AgentEHR-Bench, and Qwen3.5-35B-A3B where applicable.