adi-qwen2.5-14b-glm5.2-general

adi-qwen2.5-14b-glm5.2-general

Part of the ADI (Advanced Data Intelligence) model line โ€” ADI Qwen series.

A compact, fully local model that reasons and answers like a frontier teacher. Built by distilling glm-5.2 general-knowledge responses into a Qwen2.5-14B-Instruct student with a 4-bit QLoRA fine-tune, then merged, converted, and quantized to GGUF. The largest general ADI model to date โ€” more parametric headroom than the 8B, still small enough to run on a single 16 GB consumer GPU. The student base retains native tool calling and a long context window.

Base model Qwen/Qwen2.5-14B-Instruct
Teacher glm-5.2 (responses distilled, thinking disabled)
Method 4-bit QLoRA SFT (rank 16) โ†’ merge โ†’ GGUF
Quantization Q4_K_M (~8.4 GB, 4.87 bpw)
License Apache-2.0 (inherited from Qwen2.5-14B)
Context 128K (inherited from base)
Tool calling Supported (inherited from base)

Run it

Pull directly into Ollama:

ollama run hf.co/AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF:Q4_K_M

Or download the .gguf and point any llama.cpp-based runtime at it.

What this model is

This is a knowledge distillation: a strong teacher (glm-5.2) generated high-quality answers across a clean general-knowledge prompt set, and the Qwen2.5-14B-Instruct student was fine-tuned to imitate them. The result reasons and responds noticeably more like its teacher on general topics, with the most headroom of any general model in the ADI line, while still fitting on a single consumer GPU.

What distillation does โ€” and doesn't do. It transfers the teacher's reasoning style and answer quality, not net-new facts. A 14B model carries more parametric knowledge than the smaller ADI students, but it still isn't an encyclopedia. For raw factual recall, retrieval-augmented generation (RAG) is the right tool, not fine-tuning. What you get here is a 14B that structures and explains like a much larger model on topics it already partly knows.

Training

Metric Value
Training pairs 2,000 (deterministic subset of a 4,982-pair clean set)
Teacher tokens generated ~3.58M output tokens
Epochs 3
Steps 750
Final train loss 0.9086 (mean; per-step down to ~0.74)
LoRA rank / alpha 16 / 16
Trainable params 68.8M (0.46% of 14.84B)
Precision 4-bit QLoRA (nf4)
Peak VRAM 12.05 GB
Hardware single RTX 5060 Ti (16 GB)
Training time 4.24 h (~20 s/step)

The seed prompts were drawn from the human-written Databricks Dolly-15k dataset (filtered to remove items requiring an attached context passage, then deduplicated). The teacher was queried with thinking disabled so the student learns clean final answers rather than chain-of-thought.

Notes for re-builders

  • 4-bit QLoRA via Unsloth with gradient checkpointing ("unsloth" mode), max_seq_length 2048, per-device batch 1 ร— grad-accum 8, paged_adamw_8bit, LoRA targeting all attention + MLP projections. Peak VRAM held at 12.05 GB on a 16 GB card.
  • GGUF conversion was done via streaming LoRA merge โ†’ f16 GGUF (28 GB intermediate) โ†’ Q4_K_M quantize (8.4 GB, 4.87 bpw) with llama.cpp.

Intended use

General-purpose local assistant: explanations, reasoning, Q&A, and tool-calling workflows where a capable, private, offline-capable model is preferred over a hosted API. Not intended as a source of authoritative facts without retrieval.

License

Apache-2.0, inherited from the Qwen2.5-14B-Instruct base model. You are free to use, modify, and redistribute under the terms of that license. Distilled training data was generated using glm-5.2; users should review the teacher model's terms for their own use case.


Built at theLAB โ€” Learning. Algorithms. Breakthroughs.

Downloads last month
-
GGUF
Model size
15B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF

Base model

Qwen/Qwen2.5-14B
Quantized
(140)
this model