adi-qwen2.5-coder-7b-kimi2.7-code

adi-qwen2.5-coder-7b-kimi2.7-code

Part of the ADI (Advanced Data Intelligence) model line โ€” ADI Qwen2.5 series.

A small, fully local coding model that writes code like a frontier teacher. Built by distilling kimi-k2.7-code coding responses into a Qwen2.5-Coder-7B student with a 4-bit QLoRA fine-tune, then merged, converted, and quantized to GGUF. The student base retains native tool calling and a long context window.

Base model Qwen/Qwen2.5-Coder-7B
Teacher kimi-k2.7-code (responses distilled, thinking disabled)
Method 4-bit QLoRA SFT (rank 16) โ†’ merge โ†’ GGUF
Quantization Q4_K_M (~4.4 GB)
License Apache-2.0 (inherited from Qwen2.5-Coder-7B)
Context 128K (inherited from base)
Tool calling Supported (inherited from base)

Run it

Pull directly into Ollama:

ollama run hf.co/AdvancedDataIntelligence/adi-qwen2.5-coder-7b-kimi2.7-code-GGUF:Q4_K_M

Or download the .gguf and point any llama.cpp-based runtime at it.

What this model is

This is a knowledge distillation: a strong coding teacher (kimi-k2.7-code) generated high-quality solutions across ~2,000 diverse programming prompts, and the Qwen2.5-Coder-7B student was fine-tuned to imitate them. The result writes and explains code noticeably more like its teacher, while staying small enough to run on a single consumer GPU.

What distillation does โ€” and doesn't do. It transfers the teacher's coding style and solution quality, not net-new knowledge of every library or API. A 7B model won't memorize all of PyPI. What you get here is a 7B that structures, explains, and writes code more like a much larger model on tasks it already partly knows.

Training

Metric Value
Training pairs 2,000
Teacher tokens generated ~1.58M
Epochs 3
Steps 750
Final train loss 0.7623
LoRA rank / alpha 16 / 16
Trainable params 40.4M (0.53% of 7.66B)
Precision 4-bit QLoRA
Hardware single RTX 5060 Ti (16 GB)
Training time 2h 01m

The seed prompts were drawn from the glaive-code-assistant dataset (filtered by length and deduplicated). The teacher was queried with thinking disabled so the student learns clean, direct solutions.

Notes for re-builders

  • Qwen2.5-Coder trains cleanly in 4-bit QLoRA. Unlike the Mamba-hybrid Qwen3.5, the standard Qwen2 architecture quantizes well for training; QLoRA uses ~12 GB on a 7B โ€” comfortable on a 16 GB card.
  • GGUF conversion was done with llama.cpp's convert_hf_to_gguf.py. Qwen2.5-Coder is a long-supported standard architecture, so conversion is straightforward.
  • The merged model preserves the Qwen2.5 chat template with tool-calling support.

Intended use

Local coding assistant: code generation, explanation, debugging, refactoring, and tool-calling workflows where a small, private, offline-capable model is preferred over a hosted API.

License

Apache-2.0, inherited from the Qwen2.5-Coder-7B base model. You are free to use, modify, and redistribute under the terms of that license. Distilled training data was generated using kimi-k2.7-code; users should review the teacher model's terms for their own use case.


Built at theLAB โ€” Learning. Algorithms. Breakthroughs.

Downloads last month
213
GGUF
Model size
8B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for AdvancedDataIntelligence/adi-qwen2.5-coder-7b-kimi2.7-code-GGUF

Base model

Qwen/Qwen2.5-7B
Quantized
(43)
this model