Instructions to use iAmBoosted/Qwen3.5-9B-OSS-Distilled with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use iAmBoosted/Qwen3.5-9B-OSS-Distilled with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="iAmBoosted/Qwen3.5-9B-OSS-Distilled") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("iAmBoosted/Qwen3.5-9B-OSS-Distilled") model = AutoModelForImageTextToText.from_pretrained("iAmBoosted/Qwen3.5-9B-OSS-Distilled") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use iAmBoosted/Qwen3.5-9B-OSS-Distilled with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "iAmBoosted/Qwen3.5-9B-OSS-Distilled" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "iAmBoosted/Qwen3.5-9B-OSS-Distilled", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/iAmBoosted/Qwen3.5-9B-OSS-Distilled
- SGLang
How to use iAmBoosted/Qwen3.5-9B-OSS-Distilled with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "iAmBoosted/Qwen3.5-9B-OSS-Distilled" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "iAmBoosted/Qwen3.5-9B-OSS-Distilled", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "iAmBoosted/Qwen3.5-9B-OSS-Distilled" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "iAmBoosted/Qwen3.5-9B-OSS-Distilled", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use iAmBoosted/Qwen3.5-9B-OSS-Distilled with Docker Model Runner:
docker model run hf.co/iAmBoosted/Qwen3.5-9B-OSS-Distilled
Qwen3.5-9B-OSS-Distilled
A reasoning-style distillation of Qwen/Qwen3.5-9B. The goal here was behavioral, not capability: stock Qwen3.5-9B frequently spirals on hard prompts — it wanders inside its <think> block and never terminates with an answer. This model was fine-tuned to adopt the tight, terminating reasoning style of openai/gpt-oss-20b, so that it reliably finishes reasoning and produces an answer.
TL;DR
- No-answer ("spiral-out") rate on a 400-prompt hard holdout: 36.2% → 0.5%.
- On the 219 prompts where both models produced a usable answer, a blind A/B judge preferred this model 60.3% of the time (ties excluded).
- This is a style fix. It does not add knowledge or raise the raw problem-solving ceiling.
Model details
- Base model: Qwen/Qwen3.5-9B (Apache-2.0)
- Teacher: openai/gpt-oss-20b (Apache-2.0)
- Method: LoRA supervised fine-tuning (rank 16, alpha 16, bf16) with Unsloth, then merged into a standalone 16-bit model.
- Training data: iAmBoosted/gpt-oss-20b-reasoning-traces — 3,333 filtered GPT-OSS-20B reasoning traces.
- Language: English
Note on the base model. Qwen3.5-9B is a vision-language model. This distillation used text-only data and was evaluated on text-only prompts. Only the language/reasoning behavior was changed; any multimodal capability of the base is untested after fine-tuning and should not be relied on.
Intended use
Use it where you want Qwen3.5-9B-class reasoning that reliably terminates — math, science, code, and logic prompts that tend to make the stock model run away inside its reasoning. It is also a small, reproducible case study in reasoning-style distillation.
Out of scope: this is not a capability upgrade. It does not know more than the base model and should not be expected to beat it on tasks the base already handles well. Multimodal use is untested.
Evaluation
Evaluated on a 400-prompt held-out set drawn from the same sources as the training data. None of the held-out prompts were trained on.
Termination (the spiral fix)
| Metric | Stock Qwen3.5-9B | Distilled |
|---|---|---|
Answered (ok) |
251 / 400 | 397 / 400 |
No answer (empty) |
145 (36.2%) | 2 (0.5%) |
| Truncated | 4 (1.0%) | 1 (0.2%) |
Blind quality judgment
A blind, randomized A/B judge (a Gemma-class model, with no knowledge of which answer came from which model) compared the two models on the 251 prompts where both produced a usable answer; 219 pairs were scored.
| Outcome | Count | Share |
|---|---|---|
| Distilled preferred | 105 | 47.9% |
| Tie | 45 | 20.5% |
| Baseline preferred | 69 | 31.5% |
Ties excluded, the distilled model was preferred in 60.3% of decided pairs.
| Domain | Distilled / Tie / Baseline |
|---|---|
| physics | 29 / 0 / 12 |
| biology | 26 / 0 / 14 |
| chemistry | 25 / 3 / 18 |
| code | 16 / 4 / 12 |
| math | 7 / 23 / 7 |
| puzzle | 2 / 15 / 6 |
Limitations
- Style, not capability. The win is reliable termination and a cleaner reasoning style — not new knowledge or higher raw accuracy.
- Puzzle domain. On puzzle prompts the baseline was actually preferred (6 decided pairs vs 2). The tighter reasoning style appears to trim the exploratory wandering that some puzzles benefit from.
- Math is roughly even (7 / 7, with 23 ties) — distillation neither clearly helped nor hurt math quality.
- The judge was an LLM and was not human-validated. Treat the 60.3% as indicative, not definitive.
- Coverage. Evaluation is a single 400-prompt holdout; ~30 of the 251 comparable pairs were dropped due to API/parse failures during judging.
- Multimodal behavior is untested (see the note above).
How to use
Load and run it exactly as you would the Qwen3.5-9B base model — this is a standard merged fine-tune. Qwen3.5 requires a recent transformers (and a recent vLLM if you serve it that way); see the base model card for the current version requirements and the canonical loading snippet.
License & attribution
Released under Apache-2.0, inherited from the Qwen3.5-9B base. Teacher outputs come from GPT-OSS-20B (Apache-2.0). Built with Unsloth. Training prompts derive from several open datasets with mixed licenses — see the dataset card for full source attribution and licensing.
- Downloads last month
- 92
