Atomic Chat Join Discord GitHub

Qwen3 Coder 30B A3B

Qwen3 Coder 30B A3B, self-quantized to GGUF by Atomic Chat. Built straight from Qwen's original weights with a per-tensor importance matrix. Runs fully offline.

Highlights

  • Agentic coding specialist with significant performance among open models on agentic coding, agentic browser-use, and other foundational coding tasks.
  • Efficient MoE: 30.5B total parameters, only 3.3B activated per token (128 experts, 8 activated).
  • 256K native context (262,144 tokens), extendable up to ~1M tokens with Yarn, optimized for repository-scale understanding.
  • Tool calling built in with a specially designed function-call format, supporting platforms such as Qwen Code and CLINE.
  • Non-thinking mode only — does not emit <think></think> blocks; no enable_thinking flag required.
  • Full quant ladder with an importance matrix on every quant over calibration_datav3.

These GGUFs are self-quantized from the original weights, not a repack. The importance matrix keeps low-bit quants closer to the full-precision model.

Always pass --jinja so the Qwen3 Coder 30B A3B chat template is applied. Without it the model can emit malformed turns.

Model Overview

Property Value
Base model Qwen/Qwen3-Coder-30B-A3B-Instruct
Total / active parameters 30.5B total, 3.3B activated (128 experts, 8 activated)
Layers 48
Context length 262,144 native (extendable to ~1M with Yarn)
Architecture Causal LM, Mixture-of-Experts; GQA (32 Q heads, 4 KV heads)
This repo GGUF quants (imatrix)

See the official model card for Qwen's published benchmark results.

Choosing a quant

Quant Size Notes
Q2_K 11.3 GB Smallest. Minimal RAM, clear quality drop.
IQ3_M 13.5 GB Beats Q3 at similar size thanks to imatrix. Best low-RAM pick.
Q3_K_M 14.7 GB Low quality but usable.
Q3_K_L 15.9 GB A step above Q3_K_M.
IQ4_XS 16.4 GB Excellent quality for size. Recommended low-bit.
Q4_K_S 17.5 GB Compact Q4, fast.
Q4_K_M 18.6 GB Recommended default. Best balance of size, speed and quality.
UD-Q4_K_XL 18.8 GB Dynamic. Embeddings and output kept at Q8_0 for higher quality at a Q4 footprint.
Q5_K_S 19.7 GB Higher quality.
Q5_K_M 12.1 GB Higher quality, low loss.
Q6_K 17.4 GB Near lossless.
Q8_0 20.3 GB Effectively lossless, reference quality.

Pick the largest file that fits your (V)RAM with room for context. Q4_K_M or UD-Q4_K_XL is the sweet spot for most setups; Q6_K or Q8_0 for maximum fidelity.

Get started

Run Qwen3 Coder 30B A3B locally with:

  • Atomic Chat: the easiest path. Open the app, search AtomicChat/qwen3-coder-30b-a3b-GGUF, pick a quant, hit Use this model.
  • llama.cpp: llama-server -hf AtomicChat/qwen3-coder-30b-a3b-GGUF:Q4_K_M --jinja -c 8192
  • Ollama: ollama run hf.co/AtomicChat/qwen3-coder-30b-a3b-GGUF:Q4_K_M
  • LM Studio / Jan: search the repo id, download any quant.

Best practices

Parameter Value
temperature 0.7
top_p 0.8
top_k 20
repetition_penalty 1.05

Qwen's recommended settings for this model (non-thinking); recommended output length 65,536 tokens.

Run in llama.cpp

git clone https://github.com/ggerganov/llama.cpp
cmake llama.cpp -B llama.cpp/build -DBUILD_SHARED_LIBS=OFF -DGGML_CUDA=ON
cmake --build llama.cpp/build --config Release -j --target llama-cli llama-server
./llama.cpp/build/bin/llama-server \
    -hf AtomicChat/qwen3-coder-30b-a3b-GGUF:UD-Q4_K_XL \
    --jinja -ngl 99 -c 8192 -fa on

How these were made

  1. Download Qwen/Qwen3-Coder-30B-A3B-Instruct (original weights).
  2. Convert to f16 GGUF with llama.cpp.
  3. Build an importance matrix over calibration_datav3 (100 chunks).
  4. Quantize the full ladder with --imatrix.
  5. UD-Q4_K_XL additionally pins the token-embedding and output tensors to Q8_0.

License

Released by Qwen under the Apache 2.0 license. Quantized by Atomic Chat.

Downloads last month
1,157
GGUF
Model size
31B params
Architecture
qwen3moe
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AtomicChat/qwen3-coder-30b-a3b-GGUF

Quantized
(149)
this model

Collection including AtomicChat/qwen3-coder-30b-a3b-GGUF