Qwen2.5-Coder-7B Text-to-SQL — GGUF quants

GGUF quantizations of junmingg/qwen2.5-coder-7b-text2sql for CPU/GPU inference via llama.cpp, Ollama, LM Studio, etc.

File Quant Size Notes
qwen2.5-coder-7b-instruct.Q4_K_M.gguf Q4_K_M ~4.7 GB 4-bit, best size/quality balance (recommended)
qwen2.5-coder-7b-instruct.Q6_K.gguf Q6_K ~6.3 GB 6-bit, near-Q8 quality, smaller
qwen2.5-coder-7b-instruct.Q8_0.gguf Q8_0 ~8.1 GB 8-bit, near-lossless

See the main model card for results (exact 78.8% / semantic 86.2% / validity 99.2% vs base 3.8 / 67.0 / 100), training details, and the required system prompt — the model expects the text-to-SQL system message + ChatML format.

Quick start (Ollama)

ollama run hf.co/junmingg/qwen2.5-coder-7b-text2sql-GGUF:Q4_K_M

Quick start (llama.cpp)

llama-cli -hf junmingg/qwen2.5-coder-7b-text2sql-GGUF:Q4_K_M

License: Apache-2.0 (base model) / data CC-BY-4.0.

Downloads last month
97
GGUF
Model size
8B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

4-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for junmingg/qwen2.5-coder-7b-text2sql-GGUF

Base model

Qwen/Qwen2.5-7B
Quantized
(1)
this model

Dataset used to train junmingg/qwen2.5-coder-7b-text2sql-GGUF