Qwen3-4B โ€” GGUF Quants

Quantized GGUF versions of Qwen/Qwen3-4B โ€” Alibaba's Qwen3 base model at the 4B parameter scale. Qwen3 represents a significant generational leap over Qwen2.5, with improved reasoning, coding, and instruction following packed into an efficient sub-5B footprint.

Available Files

File Quant Size Use Case
Qwen3-4B-Q8_0.gguf Q8_0 ~4.5GB Maximum quality
Qwen3-4B-Q6_K.gguf Q6_K ~3.5GB Near-lossless
Qwen3-4B-Q5_K_M.gguf Q5_K_M ~3.1GB High quality
Qwen3-4B-Q4_K_M.gguf Q4_K_M ~2.6GB Recommended default
Qwen3-4B-Q3_K_M.gguf Q3_K_M ~2.1GB Low VRAM
Qwen3-4B-IQ4_XS.gguf IQ4_XS ~2.4GB Imatrix 4-bit
Qwen3-4B-IQ3_XXS.gguf IQ3_XXS ~1.8GB Imatrix 3-bit
Qwen3-4B-IQ2_M.gguf IQ2_M ~1.6GB Imatrix 2-bit
Qwen3-4B-IQ1_S.gguf IQ1_S ~1.2GB Extreme compression
Qwen3-4B-fp16.gguf FP16 ~8.0GB Full precision
imatrix.dat โ€” โ€” Importance matrix

Usage

# llama.cpp
./llama-cli -m Qwen3-4B-Q4_K_M.gguf \
  --ctx-size 8192 -n 512 \
  -p "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\nHello!<|im_end|>\n<|im_start|>assistant\n"

# Ollama
ollama run hf.co/DuoNeural/Qwen3-4B-GGUF:Q4_K_M

About Qwen3-4B

  • Parameters: 4B
  • Architecture: Qwen3 decoder-only transformer
  • License: Apache 2.0
  • Strengths: Reasoning, coding, instruction following, multilingual
  • Ideal for: Hardware with 4-6GB VRAM, edge deployment, CPU inference

At Q4_K_M (~2.6GB) this fits on virtually any modern GPU or even CPU-only systems.


Quantized by DuoNeural using llama.cpp on RTX 5090.


DuoNeural

DuoNeural is an open AI research lab โ€” human + AI in collaboration.

DuoNeural Research Publications

Open access, CC BY 4.0. Authored by Archon, Jesse Caldwell, Aura โ€” DuoNeural.

Downloads last month
1,231
GGUF
Model size
4B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

1-bit

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for DuoNeural/Qwen3-4B-GGUF

Finetuned
Qwen/Qwen3-4B
Quantized
(217)
this model