Instructions to use AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF", filename="adi-qwen2.5-14b-glm5.2-general-q4_k_m.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF:Q4_K_M # Run inference directly in the terminal: llama cli -hf AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF:Q4_K_M # Run inference directly in the terminal: llama cli -hf AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF:Q4_K_M
Use Docker
docker model run hf.co/AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF:Q4_K_M
- Ollama
How to use AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF with Ollama:
ollama run hf.co/AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF:Q4_K_M
- Unsloth Studio
How to use AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF to start chatting
- Pi
How to use AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF:Q4_K_M
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF with Docker Model Runner:
docker model run hf.co/AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF:Q4_K_M
- Lemonade
How to use AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.adi-qwen2.5-14b-glm5.2-general-GGUF-Q4_K_M
List all available models
lemonade list
adi-qwen2.5-14b-glm5.2-general
Part of the ADI (Advanced Data Intelligence) model line โ ADI Qwen series.
A compact, fully local model that reasons and answers like a frontier teacher. Built by distilling glm-5.2 general-knowledge responses into a Qwen2.5-14B-Instruct student with a 4-bit QLoRA fine-tune, then merged, converted, and quantized to GGUF. The largest general ADI model to date โ more parametric headroom than the 8B, still small enough to run on a single 16 GB consumer GPU. The student base retains native tool calling and a long context window.
| Base model | Qwen/Qwen2.5-14B-Instruct |
| Teacher | glm-5.2 (responses distilled, thinking disabled) |
| Method | 4-bit QLoRA SFT (rank 16) โ merge โ GGUF |
| Quantization | Q4_K_M (~8.4 GB, 4.87 bpw) |
| License | Apache-2.0 (inherited from Qwen2.5-14B) |
| Context | 128K (inherited from base) |
| Tool calling | Supported (inherited from base) |
Run it
Pull directly into Ollama:
ollama run hf.co/AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF:Q4_K_M
Or download the .gguf and point any llama.cpp-based runtime at it.
What this model is
This is a knowledge distillation: a strong teacher (glm-5.2) generated
high-quality answers across a clean general-knowledge prompt set, and the
Qwen2.5-14B-Instruct student was fine-tuned to imitate them. The result reasons
and responds noticeably more like its teacher on general topics, with the most
headroom of any general model in the ADI line, while still fitting on a single
consumer GPU.
What distillation does โ and doesn't do. It transfers the teacher's reasoning style and answer quality, not net-new facts. A 14B model carries more parametric knowledge than the smaller ADI students, but it still isn't an encyclopedia. For raw factual recall, retrieval-augmented generation (RAG) is the right tool, not fine-tuning. What you get here is a 14B that structures and explains like a much larger model on topics it already partly knows.
Training
| Metric | Value |
|---|---|
| Training pairs | 2,000 (deterministic subset of a 4,982-pair clean set) |
| Teacher tokens generated | ~3.58M output tokens |
| Epochs | 3 |
| Steps | 750 |
| Final train loss | 0.9086 (mean; per-step down to ~0.74) |
| LoRA rank / alpha | 16 / 16 |
| Trainable params | 68.8M (0.46% of 14.84B) |
| Precision | 4-bit QLoRA (nf4) |
| Peak VRAM | 12.05 GB |
| Hardware | single RTX 5060 Ti (16 GB) |
| Training time | 4.24 h (~20 s/step) |
The seed prompts were drawn from the human-written Databricks Dolly-15k dataset (filtered to remove items requiring an attached context passage, then deduplicated). The teacher was queried with thinking disabled so the student learns clean final answers rather than chain-of-thought.
Notes for re-builders
- 4-bit QLoRA via Unsloth with gradient checkpointing ("unsloth" mode), max_seq_length 2048, per-device batch 1 ร grad-accum 8, paged_adamw_8bit, LoRA targeting all attention + MLP projections. Peak VRAM held at 12.05 GB on a 16 GB card.
- GGUF conversion was done via streaming LoRA merge โ f16 GGUF (28 GB intermediate) โ Q4_K_M quantize (8.4 GB, 4.87 bpw) with llama.cpp.
Intended use
General-purpose local assistant: explanations, reasoning, Q&A, and tool-calling workflows where a capable, private, offline-capable model is preferred over a hosted API. Not intended as a source of authoritative facts without retrieval.
License
Apache-2.0, inherited from the Qwen2.5-14B-Instruct base model. You are free to use, modify, and redistribute under the terms of that license. Distilled training data was generated using glm-5.2; users should review the teacher model's terms for their own use case.
Built at theLAB โ Learning. Algorithms. Breakthroughs.
- Downloads last month
- -
4-bit