Instructions to use davanstrien/qwen35-9b-iconclass-sft-brillfull with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use davanstrien/qwen35-9b-iconclass-sft-brillfull with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="davanstrien/qwen35-9b-iconclass-sft-brillfull") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("davanstrien/qwen35-9b-iconclass-sft-brillfull") model = AutoModelForImageTextToText.from_pretrained("davanstrien/qwen35-9b-iconclass-sft-brillfull") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use davanstrien/qwen35-9b-iconclass-sft-brillfull with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "davanstrien/qwen35-9b-iconclass-sft-brillfull" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "davanstrien/qwen35-9b-iconclass-sft-brillfull", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/davanstrien/qwen35-9b-iconclass-sft-brillfull
- SGLang
How to use davanstrien/qwen35-9b-iconclass-sft-brillfull with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "davanstrien/qwen35-9b-iconclass-sft-brillfull" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "davanstrien/qwen35-9b-iconclass-sft-brillfull", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "davanstrien/qwen35-9b-iconclass-sft-brillfull" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "davanstrien/qwen35-9b-iconclass-sft-brillfull", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Unsloth Studio
How to use davanstrien/qwen35-9b-iconclass-sft-brillfull with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for davanstrien/qwen35-9b-iconclass-sft-brillfull to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for davanstrien/qwen35-9b-iconclass-sft-brillfull to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for davanstrien/qwen35-9b-iconclass-sft-brillfull to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="davanstrien/qwen35-9b-iconclass-sft-brillfull", max_seq_length=2048, ) - Docker Model Runner
How to use davanstrien/qwen35-9b-iconclass-sft-brillfull with Docker Model Runner:
docker model run hf.co/davanstrien/qwen35-9b-iconclass-sft-brillfull
Qwen3.5-9B — Iconclass VLM (SFT on brillfull)
A vision–language model that labels artwork images with ICONCLASS iconographic
codes (e.g. 11D35 Crucifixion, 25F23 roses, 61B:31D14 portrait of a man). Fine-tuned from
unsloth/Qwen3.5-9B-Base (multimodal, arch qwen3_5) on davanstrien/iconclass-vlm-brillfull.
Input: one artwork image. Output: a JSON object {"iconclass-codes": ["...", ...]}.
TL;DR — why this model exists
This 9B is the result that broke the recall ceiling a 4B version of the same pipeline could not. On a clean, contamination-free 788-image test (full human labels):
| system | Hierarchical F1 | code-recall | code-prec | notes |
|---|---|---|---|---|
| this 9B (single-shot) | 53.0 | 32.9 | 29.7 | best single system |
4B SFT (…-sft-brillfull) |
45.2 | 25.6 | 23.7 | predecessor; capability-bound at ~25% recall |
| anchored fusion (4B + retrieval + judge) | 48.5 | — | — | prior best pipeline (no training) |
| 9B + retrieval fusion | 51.7 | 36.3 | 18.4 | fusion now redundant — see below |
- +7.8 H-F1 / +7.3 code-recall over the 4B — same data, same recipe, only the size changed. The ~25% recall wall that survived reward-tuning, fuller labels, and reasoning-distillation on the 4B is moved by capacity: it was a 4B-size limit, not a fundamental task ceiling.
- The 9B alone beats the anchored retrieval+judge fusion pipeline (48.5) with zero inference-time machinery. Adding retrieval on top of the 9B (51.7) now hurts H-F1 — the 9B already captures the visually-recoverable recall, so retrieval's imprecise extra codes cost more precision than they add.
Intended use & limitations
- Use: assisted iconographic cataloguing of (mostly European, 15th–19th c.) art images — suggest ICONCLASS codes for a human cataloguer to confirm. Multi-label.
- Not for: authoritative/automatic cataloguing without review. ICONCLASS is specialized; expect a human in the loop.
- Limitations:
- Valid-JSON ≈ 86% at
max_new_tokens=384(~14% of outputs are malformed/truncated) — so 53.0 is a conservative floor; a constrained-decoding / cleanup pass would recover a little more. - Structural ceiling on non-visual codes — proverbs, named persons, literary/abstract subjects are not determinable from the image alone (no vision model recovers these; they need external knowledge / retrieval at inference).
- Trained on the Brill/Arkyves distribution; may transfer less well to very different visual domains.
- Valid-JSON ≈ 86% at
How to use
import json, torch
from transformers import AutoModelForImageTextToText, AutoProcessor
from PIL import Image
model_id = "davanstrien/qwen35-9b-iconclass-sft-brillfull"
model = AutoModelForImageTextToText.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto")
processor = AutoProcessor.from_pretrained(model_id)
INSTRUCTION = ("Classify this image using Iconclass codes. "
"Return a JSON object with key 'iconclass-codes' containing a list of codes.")
image = Image.open("artwork.jpg").convert("RGB")
messages = [{"role": "user", "content": [{"type": "image"}, {"type": "text", "text": INSTRUCTION}]}]
text = processor.apply_chat_template(messages, add_generation_prompt=True, tokenize=False,
enable_thinking=False) # this model does NOT use <think>
inputs = processor(text=text, images=[image], return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=384, do_sample=False)
resp = processor.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
codes = json.loads(resp)["iconclass-codes"] # e.g. ["11D35", "25G3", ...]
Training
- Base:
unsloth/Qwen3.5-9B-Base(chat template cloned fromQwen/Qwen3.5-4B,<think>stripped). - Data:
davanstrien/iconclass-vlm-brillfull, configsft(~82k train; full, cleaned labels; thetestsplit is a contamination-safe held-out 788 used below). - Method: LoRA (r=16, all vision+language layers, 51M/9.46B = 0.54% trainable) via Unsloth, 1 epoch, bf16,
per-device batch 16 × grad-accum 2, lr 2e-4 cosine.
train_sft_brill.py --base-model unsloth/Qwen3.5-9B-Base(a drop-in over the 4B recipe). - Hardware/cost: HF Jobs
a100-large(1× A100-80GB),3.6 h ($9). eval_loss 0.428 (vs the 4B's 0.474). - Deps:
transformers==5.2.0(only version withqwen3_5),unsloth<2026.6,causal-conv1d(pinned wheel),flash-linear-attention— Qwen3.5's hybrid attention needs these.
Evaluation
Ruler: davanstrien/iconclass-vlm-brillfull test (788 images, full human labels, contamination-safe split
by filename hash), hierarchical-F1 with partial credit (eval_sft.py:_calculate_hierarchical_f1 — ancestor
matches earn graded credit). Greedy decoding, enable_thinking=False. See the results table above. Note: raw
H-F1 understates true performance because the ground truth is ~20–40% incomplete (many predicted codes are
genuinely depicted but unlabeled) — a judge-corrected ruler (eval_corrected.py) credits these.
The research arc (so this can be picked up later)
This model is one step in an investigation (full logs in the model-training/iconclass-qwen35 repo —
RESEARCH_LOG.md, WEAK_LABELING.md, CLAUDE.md):
- 4B is capability-bound — reward-tuning, fuller labels, and reasoning-distillation all plateau at ~25% recall / ~45 H-F1.
- Anchored fusion (4B + retriever + judge-gate,
fuse_rank.py) was the prior deployable win (48.5). - Agent + abstain-reviewer = a ~90% precision weak-labeler (
weak_label_quality.py,review_set.py,calibrate_review.py) — validated as a data engine for unlabeled images (235B-VL via the HF router, no setup). - This 9B — capacity breaks the recall wall and beats the fusion pipeline → the headline result.
Suggested next steps
- Format cleanup (constrained decoding / a JSON-repair pass) to recover the ~14% invalid-JSON handicap.
- Noisy-student self-training: weak-label NEW images (
biglam/european_art) with the agent + abstain-reviewer → SFT this 9B onbrillfull-GT ∪ high-confidence weak-labels→ eval clean-788 vs 53.0. Tests whether clean NEW-image data pushes past capacity, or the 9B is at the learnable-visual ceiling. - External-knowledge tools at inference for the non-visual code residue (the structural ceiling).
- The retriever (
davanstrien/iconclass-retriever-bge-ft) + agent stack remain useful for recall-priority cataloguing even though fusion is H-F1-redundant on this 9B.
Related
- Predecessor:
davanstrien/qwen35-4b-iconclass-sft-brillfull(4B, 45.2 H-F1). - Dataset:
davanstrien/iconclass-vlm-brillfull(full-label, contamination-safe). - Retriever:
davanstrien/iconclass-retriever-bge-ft. - Source lineage: Brill ICONCLASS AI Test Set / Arkyves.
- Downloads last month
- 105