Instructions to use iAmBoosted/Qwen3.5-9B-OSS-Distilled with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use iAmBoosted/Qwen3.5-9B-OSS-Distilled with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="iAmBoosted/Qwen3.5-9B-OSS-Distilled")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("iAmBoosted/Qwen3.5-9B-OSS-Distilled")
model = AutoModelForImageTextToText.from_pretrained("iAmBoosted/Qwen3.5-9B-OSS-Distilled")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use iAmBoosted/Qwen3.5-9B-OSS-Distilled with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "iAmBoosted/Qwen3.5-9B-OSS-Distilled"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "iAmBoosted/Qwen3.5-9B-OSS-Distilled",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/iAmBoosted/Qwen3.5-9B-OSS-Distilled

SGLang

How to use iAmBoosted/Qwen3.5-9B-OSS-Distilled with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "iAmBoosted/Qwen3.5-9B-OSS-Distilled" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "iAmBoosted/Qwen3.5-9B-OSS-Distilled",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "iAmBoosted/Qwen3.5-9B-OSS-Distilled" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "iAmBoosted/Qwen3.5-9B-OSS-Distilled",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use iAmBoosted/Qwen3.5-9B-OSS-Distilled with Docker Model Runner:
```
docker model run hf.co/iAmBoosted/Qwen3.5-9B-OSS-Distilled
```

Qwen3.5-9B-OSS-Distilled

A reasoning-style distillation of Qwen/Qwen3.5-9B. The goal here was behavioral, not capability: stock Qwen3.5-9B frequently spirals on hard prompts — it wanders inside its <think> block and never terminates with an answer. This model was fine-tuned to adopt the tight, terminating reasoning style of openai/gpt-oss-20b, so that it reliably finishes reasoning and produces an answer.

TL;DR

No-answer ("spiral-out") rate on a 400-prompt hard holdout: 36.2% → 0.5%.
On the 219 prompts where both models produced a usable answer, a blind A/B judge preferred this model 60.3% of the time (ties excluded).
This is a style fix. It does not add knowledge or raise the raw problem-solving ceiling.

Model details

Base model: Qwen/Qwen3.5-9B (Apache-2.0)
Teacher: openai/gpt-oss-20b (Apache-2.0)
Method: LoRA supervised fine-tuning (rank 16, alpha 16, bf16) with Unsloth, then merged into a standalone 16-bit model.
Training data: iAmBoosted/gpt-oss-20b-reasoning-traces — 3,333 filtered GPT-OSS-20B reasoning traces.
Language: English

Note on the base model. Qwen3.5-9B is a vision-language model. This distillation used text-only data and was evaluated on text-only prompts. Only the language/reasoning behavior was changed; any multimodal capability of the base is untested after fine-tuning and should not be relied on.

Intended use

Use it where you want Qwen3.5-9B-class reasoning that reliably terminates — math, science, code, and logic prompts that tend to make the stock model run away inside its reasoning. It is also a small, reproducible case study in reasoning-style distillation.

Out of scope: this is not a capability upgrade. It does not know more than the base model and should not be expected to beat it on tasks the base already handles well. Multimodal use is untested.

Evaluation

Evaluated on a 400-prompt held-out set drawn from the same sources as the training data. None of the held-out prompts were trained on.

Termination (the spiral fix)

Metric	Stock Qwen3.5-9B	Distilled
Answered (`ok`)	251 / 400	397 / 400
No answer (`empty`)	145 (36.2%)	2 (0.5%)
Truncated	4 (1.0%)	1 (0.2%)

Blind quality judgment

A blind, randomized A/B judge (a Gemma-class model, with no knowledge of which answer came from which model) compared the two models on the 251 prompts where both produced a usable answer; 219 pairs were scored.

Outcome	Count	Share
Distilled preferred	105	47.9%
Tie	45	20.5%
Baseline preferred	69	31.5%

Ties excluded, the distilled model was preferred in 60.3% of decided pairs.

Domain	Distilled / Tie / Baseline
physics	29 / 0 / 12
biology	26 / 0 / 14
chemistry	25 / 3 / 18
code	16 / 4 / 12
math	7 / 23 / 7
puzzle	2 / 15 / 6

Limitations

Style, not capability. The win is reliable termination and a cleaner reasoning style — not new knowledge or higher raw accuracy.
Puzzle domain. On puzzle prompts the baseline was actually preferred (6 decided pairs vs 2). The tighter reasoning style appears to trim the exploratory wandering that some puzzles benefit from.
Math is roughly even (7 / 7, with 23 ties) — distillation neither clearly helped nor hurt math quality.
The judge was an LLM and was not human-validated. Treat the 60.3% as indicative, not definitive.
Coverage. Evaluation is a single 400-prompt holdout; ~30 of the 251 comparable pairs were dropped due to API/parse failures during judging.
Multimodal behavior is untested (see the note above).

How to use

Load and run it exactly as you would the Qwen3.5-9B base model — this is a standard merged fine-tune. Qwen3.5 requires a recent transformers (and a recent vLLM if you serve it that way); see the base model card for the current version requirements and the canonical loading snippet.

License & attribution

Released under Apache-2.0, inherited from the Qwen3.5-9B base. Teacher outputs come from GPT-OSS-20B (Apache-2.0). Built with Unsloth. Training prompts derive from several open datasets with mixed licenses — see the dataset card for full source attribution and licensing.

Downloads last month: 92

Safetensors

Model size

10B params

Tensor type

BF16

F32

Model tree for iAmBoosted/Qwen3.5-9B-OSS-Distilled

Base model

Qwen/Qwen3.5-9B-Base

Finetuned

Qwen/Qwen3.5-9B