adi-qwen3.5-9b-glm5.2-general

adi-qwen3.5-9b-glm5.2-general

Part of the ADI (Advanced Data Intelligence) model line โ€” ADI Qwen3 series.

A small, fully local model that reasons and answers like a frontier teacher. Built by distilling glm-5.2 general-knowledge responses into a Qwen3.5-9B student with a 4-bit QLoRA fine-tune, then merged to fp16, converted, and quantized to GGUF. The student base retains native tool calling and a long context window.

Base model Qwen/Qwen3.5-9B (trained from the unsloth mirror)
Teacher glm-5.2 (responses distilled, thinking disabled)
Method 4-bit QLoRA SFT (rank 16) โ†’ merge to fp16 โ†’ GGUF
Quantization Q4_K_M (5.6 GB); f16 also provided (17 GB)
License Apache-2.0 (inherited from Qwen3.5-9B)
Context 262K (inherited from base)
Tool calling Supported (inherited from base)
Architecture Qwen3.5 hybrid linear-attention + full-attention, with MTP head

Run it

Pull directly into Ollama:

ollama run hf.co/AdvancedDataIntelligence/adi-qwen3.5-9b-glm5.2-general-GGUF:Q4_K_M

Or download the .gguf and point any llama.cpp-based runtime at it.

What this model is

This is a knowledge distillation: a strong teacher (glm-5.2) generated high-quality answers across ~2,000 diverse general-knowledge prompts, and the Qwen3.5-9B student was fine-tuned to imitate them. The result reasons and responds noticeably more like its teacher on general topics, while staying small enough to run on a single consumer GPU.

What distillation does โ€” and doesn't do. It transfers the teacher's reasoning style and answer quality, not net-new facts. For raw factual recall, retrieval-augmented generation (RAG) is the right tool, not fine-tuning. What you get here is a 9B that structures and explains like a much larger model on topics it already partly knows.

Training

Metric Value
Training pairs 2,000
Dataset glm5.2-general-distill (train2k subset)
Epochs 3
Steps 750
Final train loss 0.8535
LoRA rank / alpha 16 / 16
Trainable params 29.1M (~0.50%)
Precision 4-bit QLoRA (nf4)
Peak VRAM 9.6 GB
Hardware single RTX 5060 Ti (16 GB)
Training time 2h 54m

The seed prompts were drawn from the human-written Databricks Dolly-15k dataset (filtered to remove items requiring an attached context passage, then deduplicated). The teacher was queried with thinking disabled so the student learns clean final answers rather than chain-of-thought.

Notes for re-builders

  • Version pins matter. Qwen3.5 requires transformers >= 5.2.0 to be recognized by Unsloth (min reported 5.2.0); the working combination is transformers == 5.5.0 with torch 2.10.0+cu128 and unsloth 2026.6.8.
  • This build used 4-bit QLoRA. It trained cleanly (loss 0.8535, peak 9.6 GB VRAM). Note that Qwen3.5's gated-delta / linear-attention layers can quantize less gracefully than dense models โ€” a bf16 LoRA pass is a reasonable upgrade path for a v2.
  • GGUF conversion used llama.cpp's convert_hf_to_gguf.py, which understands the Qwen3.5 SSM/MTP architecture and auto-skips the MTP tensors (no --no-mtp flag needed). The fp16 base (18 GB) exceeds 16 GB VRAM, so the LoRA was merged with a streaming shard-by-shard merge rather than an in-VRAM merge.
  • An explicit Qwen chat TEMPLATE plus <|im_end|> / <|endoftext|> stop tokens are set in the Modelfile to avoid runaway generation.

Intended use

General-purpose local assistant: explanations, reasoning, Q&A, and tool-calling workflows where a small, private, offline-capable model is preferred over a hosted API. Not intended as a source of authoritative facts without retrieval.

License

Apache-2.0, inherited from the Qwen3.5-9B base model. Distilled training data was generated using glm-5.2; users should review the teacher model's terms for their own use case.


Built at theLAB โ€” Learning. Algorithms. Breakthroughs.

Downloads last month
-
GGUF
Model size
9B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for AdvancedDataIntelligence/adi-qwen3.5-9b-glm5.2-general-GGUF

Finetuned
Qwen/Qwen3.5-9B
Quantized
(323)
this model