Instructions to use davanstrien/qwen35-9b-iconclass-sft-brillfull with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use davanstrien/qwen35-9b-iconclass-sft-brillfull with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="davanstrien/qwen35-9b-iconclass-sft-brillfull")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("davanstrien/qwen35-9b-iconclass-sft-brillfull")
model = AutoModelForImageTextToText.from_pretrained("davanstrien/qwen35-9b-iconclass-sft-brillfull")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use davanstrien/qwen35-9b-iconclass-sft-brillfull with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "davanstrien/qwen35-9b-iconclass-sft-brillfull"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "davanstrien/qwen35-9b-iconclass-sft-brillfull",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/davanstrien/qwen35-9b-iconclass-sft-brillfull

SGLang

How to use davanstrien/qwen35-9b-iconclass-sft-brillfull with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "davanstrien/qwen35-9b-iconclass-sft-brillfull" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "davanstrien/qwen35-9b-iconclass-sft-brillfull",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "davanstrien/qwen35-9b-iconclass-sft-brillfull" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "davanstrien/qwen35-9b-iconclass-sft-brillfull",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Unsloth Studio

How to use davanstrien/qwen35-9b-iconclass-sft-brillfull with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for davanstrien/qwen35-9b-iconclass-sft-brillfull to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for davanstrien/qwen35-9b-iconclass-sft-brillfull to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for davanstrien/qwen35-9b-iconclass-sft-brillfull to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="davanstrien/qwen35-9b-iconclass-sft-brillfull",
    max_seq_length=2048,
)

Docker Model Runner
How to use davanstrien/qwen35-9b-iconclass-sft-brillfull with Docker Model Runner:
```
docker model run hf.co/davanstrien/qwen35-9b-iconclass-sft-brillfull
```

Qwen3.5-9B — Iconclass VLM (SFT on brillfull)

A vision–language model that labels artwork images with ICONCLASS iconographic codes (e.g. 11D35 Crucifixion, 25F23 roses, 61B:31D14 portrait of a man). Fine-tuned from unsloth/Qwen3.5-9B-Base (multimodal, arch qwen3_5) on davanstrien/iconclass-vlm-brillfull.

Input: one artwork image. Output: a JSON object {"iconclass-codes": ["...", ...]}.

TL;DR — why this model exists

This 9B is the result that broke the recall ceiling a 4B version of the same pipeline could not. On a clean, contamination-free 788-image test (full human labels):

system	Hierarchical F1	code-recall	code-prec	notes
this 9B (single-shot)	53.0	32.9	29.7	best single system
4B SFT (`…-sft-brillfull`)	45.2	25.6	23.7	predecessor; capability-bound at ~25% recall
anchored fusion (4B + retrieval + judge)	48.5	—	—	prior best pipeline (no training)
9B + retrieval fusion	51.7	36.3	18.4	fusion now redundant — see below

+7.8 H-F1 / +7.3 code-recall over the 4B — same data, same recipe, only the size changed. The ~25% recall wall that survived reward-tuning, fuller labels, and reasoning-distillation on the 4B is moved by capacity: it was a 4B-size limit, not a fundamental task ceiling.
The 9B alone beats the anchored retrieval+judge fusion pipeline (48.5) with zero inference-time machinery. Adding retrieval on top of the 9B (51.7) now hurts H-F1 — the 9B already captures the visually-recoverable recall, so retrieval's imprecise extra codes cost more precision than they add.

Intended use & limitations

Use: assisted iconographic cataloguing of (mostly European, 15th–19th c.) art images — suggest ICONCLASS codes for a human cataloguer to confirm. Multi-label.
Not for: authoritative/automatic cataloguing without review. ICONCLASS is specialized; expect a human in the loop.
Limitations:
- Valid-JSON ≈ 86% at max_new_tokens=384 (~14% of outputs are malformed/truncated) — so 53.0 is a conservative floor; a constrained-decoding / cleanup pass would recover a little more.
- Structural ceiling on non-visual codes — proverbs, named persons, literary/abstract subjects are not determinable from the image alone (no vision model recovers these; they need external knowledge / retrieval at inference).
- Trained on the Brill/Arkyves distribution; may transfer less well to very different visual domains.

How to use

import json, torch
from transformers import AutoModelForImageTextToText, AutoProcessor
from PIL import Image

model_id = "davanstrien/qwen35-9b-iconclass-sft-brillfull"
model = AutoModelForImageTextToText.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto")
processor = AutoProcessor.from_pretrained(model_id)

INSTRUCTION = ("Classify this image using Iconclass codes. "
               "Return a JSON object with key 'iconclass-codes' containing a list of codes.")
image = Image.open("artwork.jpg").convert("RGB")
messages = [{"role": "user", "content": [{"type": "image"}, {"type": "text", "text": INSTRUCTION}]}]
text = processor.apply_chat_template(messages, add_generation_prompt=True, tokenize=False,
                                     enable_thinking=False)  # this model does NOT use <think>
inputs = processor(text=text, images=[image], return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=384, do_sample=False)
resp = processor.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
codes = json.loads(resp)["iconclass-codes"]   # e.g. ["11D35", "25G3", ...]

Training

Base: unsloth/Qwen3.5-9B-Base (chat template cloned from Qwen/Qwen3.5-4B, <think> stripped).
Data: davanstrien/iconclass-vlm-brillfull, config sft (~82k train; full, cleaned labels; the test split is a contamination-safe held-out 788 used below).
Method: LoRA (r=16, all vision+language layers, 51M/9.46B = 0.54% trainable) via Unsloth, 1 epoch, bf16, per-device batch 16 × grad-accum 2, lr 2e-4 cosine. train_sft_brill.py --base-model unsloth/Qwen3.5-9B-Base (a drop-in over the 4B recipe).
Hardware/cost: HF Jobs a100-large (1× A100-80GB), ~~3.6 h (~~$9). eval_loss 0.428 (vs the 4B's 0.474).
Deps: transformers==5.2.0 (only version with qwen3_5), unsloth<2026.6, causal-conv1d (pinned wheel), flash-linear-attention — Qwen3.5's hybrid attention needs these.

Evaluation

Ruler: davanstrien/iconclass-vlm-brillfull test (788 images, full human labels, contamination-safe split by filename hash), hierarchical-F1 with partial credit (eval_sft.py:_calculate_hierarchical_f1 — ancestor matches earn graded credit). Greedy decoding, enable_thinking=False. See the results table above. Note: raw H-F1 understates true performance because the ground truth is ~20–40% incomplete (many predicted codes are genuinely depicted but unlabeled) — a judge-corrected ruler (eval_corrected.py) credits these.

The research arc (so this can be picked up later)

This model is one step in an investigation (full logs in the model-training/iconclass-qwen35 repo — RESEARCH_LOG.md, WEAK_LABELING.md, CLAUDE.md):

4B is capability-bound — reward-tuning, fuller labels, and reasoning-distillation all plateau at ~25% recall / ~45 H-F1.
Anchored fusion (4B + retriever + judge-gate, fuse_rank.py) was the prior deployable win (48.5).
Agent + abstain-reviewer = a ~90% precision weak-labeler (weak_label_quality.py, review_set.py, calibrate_review.py) — validated as a data engine for unlabeled images (235B-VL via the HF router, no setup).
This 9B — capacity breaks the recall wall and beats the fusion pipeline → the headline result.

Suggested next steps

Format cleanup (constrained decoding / a JSON-repair pass) to recover the ~14% invalid-JSON handicap.
Noisy-student self-training: weak-label NEW images (biglam/european_art) with the agent + abstain-reviewer → SFT this 9B on brillfull-GT ∪ high-confidence weak-labels → eval clean-788 vs 53.0. Tests whether clean NEW-image data pushes past capacity, or the 9B is at the learnable-visual ceiling.
External-knowledge tools at inference for the non-visual code residue (the structural ceiling).
The retriever (davanstrien/iconclass-retriever-bge-ft) + agent stack remain useful for recall-priority cataloguing even though fusion is H-F1-redundant on this 9B.

Predecessor: davanstrien/qwen35-4b-iconclass-sft-brillfull (4B, 45.2 H-F1).
Dataset: davanstrien/iconclass-vlm-brillfull (full-label, contamination-safe).
Retriever: davanstrien/iconclass-retriever-bge-ft.
Source lineage: Brill ICONCLASS AI Test Set / Arkyves.

Downloads last month: 105

Safetensors

Model size

9B params

Tensor type

BF16

Model tree for davanstrien/qwen35-9b-iconclass-sft-brillfull

Base model

Qwen/Qwen3.5-9B-Base

Finetuned

unsloth/Qwen3.5-9B-Base

Adapter

(1)

this model

davanstrien
/

qwen35-9b-iconclass-sft-brillfull