Atomic Chat Join Discord GitHub

Qwen3.6 27B

Qwen3.6 27B, self-quantized to GGUF by Atomic Chat. Built straight from Qwen's original weights with a per-tensor importance matrix. Runs fully offline.

Highlights

  • First open-weight Qwen3.6 variant, following the Qwen3.5 series, with a focus on stability and real-world utility.
  • Agentic coding that handles frontend workflows and repository-level reasoning with greater fluency and precision.
  • Thinking Preservation, a new option to retain reasoning context from historical messages to streamline iterative development.
  • Base model is multimodal (vision encoder); these GGUF quants cover the text path.
  • 262,144-token native context, extensible up to ~1,010,000 tokens.
  • Full quant ladder with an importance matrix on every quant over calibration_datav3.

These GGUFs are self-quantized from the original weights, not a repack. The importance matrix keeps low-bit quants closer to the full-precision model.

Always pass --jinja so the Qwen3.6 27B chat template is applied. Without it the model can emit malformed turns.

Model Overview

Property Value
Base model Qwen/Qwen3.6-27B
Total parameters 27B
Layers 64
Context length 262,144 native, extensible up to ~1,010,000
Architecture Causal LM with vision encoder (Gated DeltaNet + Gated Attention)
This repo GGUF quants (imatrix), text path
Qwen3.6 27B benchmark scores

Scores are Qwen's published results for the base Qwen/Qwen3.6-27B. Quantization preserves the large majority of this; Q4_K_M and up sit within a point or two of full precision.

Choosing a quant

Quant Size Notes
Q2_K 10.7 GB Smallest. Minimal RAM, clear quality drop.
IQ3_M 12.6 GB Beats Q3 at similar size thanks to imatrix. Best low-RAM pick.
Q3_K_M 13.3 GB Low quality but usable.
Q3_K_L 14.3 GB A step above Q3_K_M.
IQ4_XS 15.1 GB Excellent quality for size. Recommended low-bit.
Q4_K_S 15.6 GB Compact Q4, fast.
Q4_K_M 16.5 GB Recommended default. Best balance of size, speed and quality.
UD-Q4_K_XL 17.5 GB Dynamic. Embeddings and output kept at Q8_0 for higher quality at a Q4 footprint.
Q5_K_S 18.7 GB Higher quality.
Q5_K_M 19.2 GB Higher quality, low loss.
Q6_K 22.1 GB Near lossless.
Q8_0 28.6 GB Effectively lossless, reference quality.

Pick the largest file that fits your (V)RAM with room for context. Q4_K_M or UD-Q4_K_XL is the sweet spot for most setups; Q6_K or Q8_0 for maximum fidelity.

Get started

Run Qwen3.6 27B locally with:

  • Atomic Chat: the easiest path. Open the app, search AtomicChat/qwen36-27b-GGUF, pick a quant, hit Use this model.
  • llama.cpp: llama-server -hf AtomicChat/qwen36-27b-GGUF:Q4_K_M --jinja -c 8192
  • Ollama: ollama run hf.co/AtomicChat/qwen36-27b-GGUF:Q4_K_M
  • LM Studio / Jan: search the repo id, download any quant.

Best practices

Parameter Value
temperature 0.7
top_p 0.8
top_k 20
min_p 0.0
presence_penalty 1.5
repetition_penalty 1.0

Qwen's recommended Instruct (non-thinking) settings. Thinking mode for general tasks: temperature=1.0, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=0.0, repetition_penalty=1.0.

Run in llama.cpp

git clone https://github.com/ggerganov/llama.cpp
cmake llama.cpp -B llama.cpp/build -DBUILD_SHARED_LIBS=OFF -DGGML_CUDA=ON
cmake --build llama.cpp/build --config Release -j --target llama-cli llama-server
./llama.cpp/build/bin/llama-server \
    -hf AtomicChat/qwen36-27b-GGUF:UD-Q4_K_XL \
    --jinja -ngl 99 -c 8192 -fa on

How these were made

  1. Download Qwen/Qwen3.6-27B (original weights).
  2. Convert to f16 GGUF with llama.cpp.
  3. Build an importance matrix over calibration_datav3 (100 chunks).
  4. Quantize the full ladder with --imatrix.
  5. UD-Q4_K_XL additionally pins the token-embedding and output tensors to Q8_0.

License

Released by Qwen under the Apache 2.0 license. Quantized by Atomic Chat.

Downloads last month
491
GGUF
Model size
27B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AtomicChat/qwen36-27b-GGUF

Base model

Qwen/Qwen3.6-27B
Quantized
(487)
this model

Collection including AtomicChat/qwen36-27b-GGUF