Instructions to use AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF",
	filename="adi-qwen2.5-14b-glm5.2-general-q4_k_m.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF:Q4_K_M

Use Docker

docker model run hf.co/AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF:Q4_K_M

Ollama
How to use AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF with Ollama:
```
ollama run hf.co/AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF:Q4_K_M
```

Unsloth Studio

How to use AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF to start chatting

How to use AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF:Q4_K_M

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF with Docker Model Runner:
```
docker model run hf.co/AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF:Q4_K_M
```

Lemonade

How to use AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.adi-qwen2.5-14b-glm5.2-general-GGUF-Q4_K_M

List all available models

lemonade list

adi-qwen2.5-14b-glm5.2-general

Part of the ADI (Advanced Data Intelligence) model line — ADI Qwen series.

A compact, fully local model that reasons and answers like a frontier teacher. Built by distilling glm-5.2 general-knowledge responses into a Qwen2.5-14B-Instruct student with a 4-bit QLoRA fine-tune, then merged, converted, and quantized to GGUF. The largest general ADI model to date — more parametric headroom than the 8B, still small enough to run on a single 16 GB consumer GPU. The student base retains native tool calling and a long context window.


Base model	Qwen/Qwen2.5-14B-Instruct
Teacher	glm-5.2 (responses distilled, thinking disabled)
Method	4-bit QLoRA SFT (rank 16) → merge → GGUF
Quantization	Q4_K_M (~8.4 GB, 4.87 bpw)
License	Apache-2.0 (inherited from Qwen2.5-14B)
Context	128K (inherited from base)
Tool calling	Supported (inherited from base)

Run it

Pull directly into Ollama:

ollama run hf.co/AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF:Q4_K_M

Or download the .gguf and point any llama.cpp-based runtime at it.

What this model is

This is a knowledge distillation: a strong teacher (glm-5.2) generated high-quality answers across a clean general-knowledge prompt set, and the Qwen2.5-14B-Instruct student was fine-tuned to imitate them. The result reasons and responds noticeably more like its teacher on general topics, with the most headroom of any general model in the ADI line, while still fitting on a single consumer GPU.

What distillation does — and doesn't do. It transfers the teacher's reasoning style and answer quality, not net-new facts. A 14B model carries more parametric knowledge than the smaller ADI students, but it still isn't an encyclopedia. For raw factual recall, retrieval-augmented generation (RAG) is the right tool, not fine-tuning. What you get here is a 14B that structures and explains like a much larger model on topics it already partly knows.

Training

Metric	Value
Training pairs	2,000 (deterministic subset of a 4,982-pair clean set)
Teacher tokens generated	~3.58M output tokens
Epochs	3
Steps	750
Final train loss	0.9086 (mean; per-step down to ~0.74)
LoRA rank / alpha	16 / 16
Trainable params	68.8M (0.46% of 14.84B)
Precision	4-bit QLoRA (nf4)
Peak VRAM	12.05 GB
Hardware	single RTX 5060 Ti (16 GB)
Training time	4.24 h (~20 s/step)

The seed prompts were drawn from the human-written Databricks Dolly-15k dataset (filtered to remove items requiring an attached context passage, then deduplicated). The teacher was queried with thinking disabled so the student learns clean final answers rather than chain-of-thought.

Notes for re-builders

4-bit QLoRA via Unsloth with gradient checkpointing ("unsloth" mode), max_seq_length 2048, per-device batch 1 × grad-accum 8, paged_adamw_8bit, LoRA targeting all attention + MLP projections. Peak VRAM held at 12.05 GB on a 16 GB card.
GGUF conversion was done via streaming LoRA merge → f16 GGUF (28 GB intermediate) → Q4_K_M quantize (8.4 GB, 4.87 bpw) with llama.cpp.

Intended use

General-purpose local assistant: explanations, reasoning, Q&A, and tool-calling workflows where a capable, private, offline-capable model is preferred over a hosted API. Not intended as a source of authoritative facts without retrieval.

License

Apache-2.0, inherited from the Qwen2.5-14B-Instruct base model. You are free to use, modify, and redistribute under the terms of that license. Distilled training data was generated using glm-5.2; users should review the teacher model's terms for their own use case.

Built at theLAB — Learning. Algorithms. Breakthroughs.

Downloads last month: -

GGUF

Model size

15B params

Architecture

qwen2

Hardware compatibility

4-bit

Model tree for AdvancedDataIntelligence/adi-qwen2.5-14b-glm5.2-general-GGUF

Base model

Qwen/Qwen2.5-14B

Finetuned

Qwen/Qwen2.5-14B-Instruct

Quantized

(140)

this model