adi-qwen3.5-4b-glm5.2-general

adi-qwen3.5-4b-glm5.2-general

Part of the ADI (Advanced Data Intelligence) model line โ€” ADI Qwen3 series.

A small, fully local model that reasons and answers like a frontier teacher. Built by distilling glm-5.2 general-knowledge responses into a Qwen3.5-4B student with a bf16 LoRA fine-tune, then merged, converted, and quantized to GGUF. The student base retains native tool calling and a long context window.

Base model Qwen/Qwen3.5-4B
Teacher glm-5.2 (responses distilled, thinking disabled)
Method bf16 LoRA SFT (rank 16) โ†’ merge โ†’ GGUF
Quantization Q4_K_M (~2.7 GB)
License Apache-2.0 (inherited from Qwen3.5-4B)
Context 262K (inherited from base)
Tool calling Supported (inherited from base)

Run it

Pull directly into Ollama:

ollama run hf.co/AdvancedDataIntelligence/adi-qwen3.5-4b-glm5.2-general-GGUF:Q4_K_M

Or download the .gguf and point any llama.cpp-based runtime at it.

What this model is

This is a knowledge distillation: a strong teacher (glm-5.2) generated high-quality answers across ~2,000 diverse general-knowledge prompts, and the Qwen3.5-4B student was fine-tuned to imitate them. The result reasons and responds noticeably more like its teacher on general topics, while staying small enough to run on a single consumer GPU.

What distillation does โ€” and doesn't do. It transfers the teacher's reasoning style and answer quality, not net-new facts. A 4B model won't become an encyclopedia. For raw factual recall, retrieval-augmented generation (RAG) is the right tool, not fine-tuning. What you get here is a 4B that structures and explains like a much larger model on topics it already partly knows.

Training

Metric Value
Training pairs 2,068
Teacher tokens generated ~1.36M
Epochs 3
Steps 777
Final train loss 0.9346
LoRA rank / alpha 16 / 16
Trainable params 21.2M (0.47% of 4.56B)
Precision bf16 (not 4-bit โ€” see note)
Hardware single RTX 5060 Ti (16 GB)
Training time 2h 53m

The seed prompts were drawn from the human-written Databricks Dolly-15k dataset (filtered to remove items requiring an attached context passage, then deduplicated). The teacher was queried with thinking disabled so the student learns clean final answers rather than chain-of-thought it is too small to reproduce well.

Notes for re-builders

  • Qwen3.5 trains in bf16 LoRA, not 4-bit QLoRA. Its gated-delta / Mamba-hybrid layers quantize poorly during training; 4-bit costs accuracy. bf16 LoRA uses ~10 GB on a 4B โ€” comfortable on a 16 GB card.
  • Version pins: Qwen3.5 requires transformers >= 5.2.0 to be recognized, while the Unsloth training stack caps at <= 5.5.0. The working version is transformers == 5.5.0 with numpy < 2.3.
  • GGUF conversion was done with llama.cpp's convert_hf_to_gguf.py, which already understands the Qwen3.5 SSM/MTP architecture.

Intended use

General-purpose local assistant: explanations, reasoning, Q&A, and tool-calling workflows where a small, private, offline-capable model is preferred over a hosted API. Not intended as a source of authoritative facts without retrieval.

License

Apache-2.0, inherited from the Qwen3.5-4B base model. You are free to use, modify, and redistribute under the terms of that license. Distilled training data was generated using glm-5.2; users should review the teacher model's terms for their own use case.


Built at theLAB โ€” Learning. Algorithms. Breakthroughs.

Downloads last month
620
GGUF
Model size
4B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for AdvancedDataIntelligence/adi-qwen3.5-4b-glm5.2-general-GGUF

Finetuned
Qwen/Qwen3.5-4B
Quantized
(267)
this model

Spaces using AdvancedDataIntelligence/adi-qwen3.5-4b-glm5.2-general-GGUF 3