Qwen3.6-27B-int4-AutoRound

This is an Int4 AutoRound quantization of Qwen/Qwen3.6-27B, produced using spark-auto-round.

Quantization Details

Parameter	Value
Original Model	Qwen/Qwen3.6-27B
Quantization Method	AutoRound (W4A16, symmetric)
Bits	4
Group Size	128
Calibration Dataset	opencode-instruct
Calibration Samples	512
Calibration Sequence Length	2048
Tuning Iterations	1000
Batch Size	8
Packing Format	`auto_round:auto_gptq`
AutoRound Version	0.14.2
Model Size	~19 GB

Layers Kept in FP16

The linear_attn.in_proj_a and linear_attn.in_proj_b projections across all DeltaNet layers, as well as mtp.fc, are kept at FP16 precision for quality preservation.

Quantization Report

All 64 transformer blocks passed sensitivity analysis (63 PASS, 1 WARN at layer 58).

Layer Range	Cosine Similarity	PSNR (dB)
Layers 0-10	0.9999 - 1.0000	80.7 - 84.0
Layers 11-20	0.9995 - 0.9999	74.9 - 81.5
Layers 21-30	0.9988 - 0.9995	73.6 - 78.7
Layers 31-40	0.9976 - 0.9986	69.4 - 73.2
Layers 41-50	0.9943 - 0.9976	60.2 - 69.2
Layers 51-63	0.9883 - 0.9934	53.4 - 66.5

Full per-layer reports are available in the repository: quantization-report.txt and quantization-report.csv.

How to Use

With vLLM

vllm serve coder3101/Qwen3.6-27B-int4-AutoRound

With Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "coder3101/Qwen3.6-27B-int4-AutoRound"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")

prompt = "Explain the theory of relativity in simple terms."
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Acknowledgments

Quantization performed using spark-auto-round by @whpthomas
Based on AutoRound by Intel

Original Model Card -- Qwen3.6-27B

Below is the original model card from Qwen/Qwen3.6-27B.

Qwen3.6-27B

Highlights

Qwen3.6-27B follows the Qwen3.5 series with key upgrades:

Agentic Coding: Handles frontend workflows and repo-level reasoning with greater fluency.
Thinking Preservation: New option to retain reasoning context from historical messages, reducing overhead in iterative development.

Model Architecture

Property	Value
Type	Causal Language Model with Vision Encoder
Parameters	27B
Hidden Dimension	5120
Token Embedding	248320 (Padded)
Number of Layers	64
Hidden Layout	`16 x (3 x (Gated DeltaNet -> FFN) -> 1 x (Gated Attention -> FFN))`
FFN Intermediate Dimension	17408
Context Length	262,144 (natively), up to 1,010,000 with YaRN

Gated DeltaNet: 48 linear attention heads for V, 16 for QK (head dim: 128) Gated Attention: 24 heads for Q, 4 for KV (head dim: 256, RoPE dim: 64)

Benchmark Results -- Language

Benchmark	Qwen3.5-27B	Qwen3.5-397B-A17B	Gemma4-31B	Claude 4.5 Opus	Qwen3.6-35B-A3B	Qwen3.6-27B
SWE-bench Verified	75.0	76.2	52.0	80.9	73.4	77.2
SWE-bench Pro	51.2	50.9	35.7	57.1	49.5	53.5
SWE-bench Multilingual	69.3	69.3	51.7	77.5	67.2	71.3
Terminal-Bench 2.0	41.6	52.5	42.9	59.3	51.5	59.3
SkillsBench Avg5	27.2	30.0	23.6	45.3	28.7	48.2
QwenWebBench	1068	1186	1197	1536	1397	1487
NL2Repo	27.3	32.2	15.5	43.2	29.4	36.2
Claw-Eval Avg	64.3	70.7	48.5	76.6	68.7	72.4
Claw-Eval Pass^3	46.2	48.1	25.0	59.6	50.0	60.6
QwenClawBench	52.2	51.8	41.7	52.3	52.6	53.4
MMLU-Pro	86.1	87.8	85.2	89.5	85.2	86.2
MMLU-Redux	93.2	94.9	93.7	95.6	93.3	93.5
SuperGPQA	65.6	70.4	65.7	70.6	64.7	66.0
C-Eval	90.5	93.0	82.6	92.2	90.0	91.4
GPQA Diamond	85.5	88.4	84.3	87.0	86.0	87.8
HLE	24.3	28.7	19.5	30.8	21.4	24.0
LiveCodeBench v6	80.7	83.6	80.0	84.8	80.4	83.9
HMMT Feb 25	92.0	94.8	88.7	92.9	90.7	93.8
HMMT Nov 25	89.8	92.7	87.5	93.3	89.1	90.7
HMMT Feb 26	84.3	87.9	77.2	85.3	83.6	84.3
IMOAnswerBench	79.9	80.9	74.5	84.0	78.9	80.8
AIME26	92.6	93.3	89.2	95.1	92.7	94.1

Benchmark Results -- Vision Language

Benchmark	Qwen3.5-27B	Qwen3.5-397B-A17B	Gemma4-31B	Claude 4.5 Opus	Qwen3.6-35B-A3B	Qwen3.6-27B
MMMU	82.3	85.0	80.4	80.7	81.7	82.9
MMMU-Pro	75.0	79.0	76.9	70.6	75.3	75.8
MathVista mini	87.8	--	79.3	--	86.4	87.4
DynaMath	87.7	86.3	79.5	79.7	82.8	85.6
VlmsAreBlind	96.9	--	87.2	--	96.6	97.0
RealWorldQA	83.7	83.9	72.3	77.0	85.3	84.1
MMStar	81.0	83.8	77.3	73.2	80.7	81.4
MMBenchEN-DEV-v1.1	92.6	--	90.9	--	92.8	92.3
SimpleVQA	56.0	67.1	52.9	65.7	58.9	56.1
CharXiv RQ	79.5	80.8	67.9	68.5	78.0	78.4
CC-OCR	81.0	82.0	75.7	76.9	81.9	81.2
OCRBench	89.4	--	86.1	--	90.0	89.4
ERQA	60.5	67.5	57.5	46.8	61.8	62.5
CountBench	97.8	97.2	96.1	90.6	96.1	97.8
RefCOCO avg	90.9	92.3	--	--	92.0	92.5
EmbSpatialBench	84.5	--	--	--	84.3	84.6
RefSpatialBench	67.7	--	4.7	--	64.3	70.0
VideoMME (w sub.)	87.0	87.5	--	77.7	86.6	87.7
VideoMMMU	82.3	84.7	81.6	84.4	83.7	84.4
MLVU	85.9	86.7	--	81.7	86.2	86.6
MVBench	74.6	77.6	--	67.2	74.6	75.5
V*	93.7	95.8	--	67.0	90.1	94.7
AndroidWorld	64.2	--	--	--	--	70.3

Serving Frameworks

SGLang (>=0.5.10)
vLLM (>=0.19.0)
KTransformers
HuggingFace Transformers

Sampling Parameters

Mode	Temperature	top_p	top_k	min_p	presence_penalty
Thinking (general)	1.0	0.95	20	0.0	0.0
Thinking (precise coding/WebDev)	0.6	0.95	20	--	--
Non-thinking / Instruct	0.7	0.80	20	--	1.5

Key Features

Thinking mode is on by default; can be disabled via enable_thinking: False.
Does not support soft switch (/think and /nothink from Qwen3).
Preserve Thinking: preserve_thinking: True retains reasoning traces from history.
Supports text, image, and video inputs.
Multi-Token Prediction (MTP) supported.
Native context length: 262,144 tokens; extensible to 1,010,000 tokens with YaRN RoPE scaling.

Citation

@misc{qwen3.6-27b,
    title  = {{Qwen3.6-27B}: Flagship-Level Coding in a {27B} Dense Model},
    author = {{Qwen Team}},
    month  = {April},
    year   = {2026},
    url    = {https://qwen.ai/blog?id=qwen3.6-27b}
}

Downloads last month: 216

Safetensors

Model size

3B params

Tensor type

BF16

I32

F16

Model tree for coder3101/Qwen3.6-27B-int4-AutoRound

Base model

Qwen/Qwen3.6-27B

Quantized

(517)

this model

Collection including coder3101/Qwen3.6-27B-int4-AutoRound

Qwen 3.6 - AutoRound

Collection

1 item • Updated 4 days ago