Qwen 3.6 - AutoRound
Collection
1 item • Updated
This is an Int4 AutoRound quantization of Qwen/Qwen3.6-27B, produced using spark-auto-round.
| Parameter | Value |
|---|---|
| Original Model | Qwen/Qwen3.6-27B |
| Quantization Method | AutoRound (W4A16, symmetric) |
| Bits | 4 |
| Group Size | 128 |
| Calibration Dataset | opencode-instruct |
| Calibration Samples | 512 |
| Calibration Sequence Length | 2048 |
| Tuning Iterations | 1000 |
| Batch Size | 8 |
| Packing Format | auto_round:auto_gptq |
| AutoRound Version | 0.14.2 |
| Model Size | ~19 GB |
The linear_attn.in_proj_a and linear_attn.in_proj_b projections across all DeltaNet layers, as well as mtp.fc, are kept at FP16 precision for quality preservation.
All 64 transformer blocks passed sensitivity analysis (63 PASS, 1 WARN at layer 58).
| Layer Range | Cosine Similarity | PSNR (dB) |
|---|---|---|
| Layers 0-10 | 0.9999 - 1.0000 | 80.7 - 84.0 |
| Layers 11-20 | 0.9995 - 0.9999 | 74.9 - 81.5 |
| Layers 21-30 | 0.9988 - 0.9995 | 73.6 - 78.7 |
| Layers 31-40 | 0.9976 - 0.9986 | 69.4 - 73.2 |
| Layers 41-50 | 0.9943 - 0.9976 | 60.2 - 69.2 |
| Layers 51-63 | 0.9883 - 0.9934 | 53.4 - 66.5 |
Full per-layer reports are available in the repository: quantization-report.txt and quantization-report.csv.
vllm serve coder3101/Qwen3.6-27B-int4-AutoRound
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "coder3101/Qwen3.6-27B-int4-AutoRound"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
prompt = "Explain the theory of relativity in simple terms."
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Below is the original model card from Qwen/Qwen3.6-27B.
Qwen3.6-27B follows the Qwen3.5 series with key upgrades:
| Property | Value |
|---|---|
| Type | Causal Language Model with Vision Encoder |
| Parameters | 27B |
| Hidden Dimension | 5120 |
| Token Embedding | 248320 (Padded) |
| Number of Layers | 64 |
| Hidden Layout | 16 x (3 x (Gated DeltaNet -> FFN) -> 1 x (Gated Attention -> FFN)) |
| FFN Intermediate Dimension | 17408 |
| Context Length | 262,144 (natively), up to 1,010,000 with YaRN |
Gated DeltaNet: 48 linear attention heads for V, 16 for QK (head dim: 128) Gated Attention: 24 heads for Q, 4 for KV (head dim: 256, RoPE dim: 64)
| Benchmark | Qwen3.5-27B | Qwen3.5-397B-A17B | Gemma4-31B | Claude 4.5 Opus | Qwen3.6-35B-A3B | Qwen3.6-27B |
|---|---|---|---|---|---|---|
| SWE-bench Verified | 75.0 | 76.2 | 52.0 | 80.9 | 73.4 | 77.2 |
| SWE-bench Pro | 51.2 | 50.9 | 35.7 | 57.1 | 49.5 | 53.5 |
| SWE-bench Multilingual | 69.3 | 69.3 | 51.7 | 77.5 | 67.2 | 71.3 |
| Terminal-Bench 2.0 | 41.6 | 52.5 | 42.9 | 59.3 | 51.5 | 59.3 |
| SkillsBench Avg5 | 27.2 | 30.0 | 23.6 | 45.3 | 28.7 | 48.2 |
| QwenWebBench | 1068 | 1186 | 1197 | 1536 | 1397 | 1487 |
| NL2Repo | 27.3 | 32.2 | 15.5 | 43.2 | 29.4 | 36.2 |
| Claw-Eval Avg | 64.3 | 70.7 | 48.5 | 76.6 | 68.7 | 72.4 |
| Claw-Eval Pass^3 | 46.2 | 48.1 | 25.0 | 59.6 | 50.0 | 60.6 |
| QwenClawBench | 52.2 | 51.8 | 41.7 | 52.3 | 52.6 | 53.4 |
| MMLU-Pro | 86.1 | 87.8 | 85.2 | 89.5 | 85.2 | 86.2 |
| MMLU-Redux | 93.2 | 94.9 | 93.7 | 95.6 | 93.3 | 93.5 |
| SuperGPQA | 65.6 | 70.4 | 65.7 | 70.6 | 64.7 | 66.0 |
| C-Eval | 90.5 | 93.0 | 82.6 | 92.2 | 90.0 | 91.4 |
| GPQA Diamond | 85.5 | 88.4 | 84.3 | 87.0 | 86.0 | 87.8 |
| HLE | 24.3 | 28.7 | 19.5 | 30.8 | 21.4 | 24.0 |
| LiveCodeBench v6 | 80.7 | 83.6 | 80.0 | 84.8 | 80.4 | 83.9 |
| HMMT Feb 25 | 92.0 | 94.8 | 88.7 | 92.9 | 90.7 | 93.8 |
| HMMT Nov 25 | 89.8 | 92.7 | 87.5 | 93.3 | 89.1 | 90.7 |
| HMMT Feb 26 | 84.3 | 87.9 | 77.2 | 85.3 | 83.6 | 84.3 |
| IMOAnswerBench | 79.9 | 80.9 | 74.5 | 84.0 | 78.9 | 80.8 |
| AIME26 | 92.6 | 93.3 | 89.2 | 95.1 | 92.7 | 94.1 |
| Benchmark | Qwen3.5-27B | Qwen3.5-397B-A17B | Gemma4-31B | Claude 4.5 Opus | Qwen3.6-35B-A3B | Qwen3.6-27B |
|---|---|---|---|---|---|---|
| MMMU | 82.3 | 85.0 | 80.4 | 80.7 | 81.7 | 82.9 |
| MMMU-Pro | 75.0 | 79.0 | 76.9 | 70.6 | 75.3 | 75.8 |
| MathVista mini | 87.8 | -- | 79.3 | -- | 86.4 | 87.4 |
| DynaMath | 87.7 | 86.3 | 79.5 | 79.7 | 82.8 | 85.6 |
| VlmsAreBlind | 96.9 | -- | 87.2 | -- | 96.6 | 97.0 |
| RealWorldQA | 83.7 | 83.9 | 72.3 | 77.0 | 85.3 | 84.1 |
| MMStar | 81.0 | 83.8 | 77.3 | 73.2 | 80.7 | 81.4 |
| MMBenchEN-DEV-v1.1 | 92.6 | -- | 90.9 | -- | 92.8 | 92.3 |
| SimpleVQA | 56.0 | 67.1 | 52.9 | 65.7 | 58.9 | 56.1 |
| CharXiv RQ | 79.5 | 80.8 | 67.9 | 68.5 | 78.0 | 78.4 |
| CC-OCR | 81.0 | 82.0 | 75.7 | 76.9 | 81.9 | 81.2 |
| OCRBench | 89.4 | -- | 86.1 | -- | 90.0 | 89.4 |
| ERQA | 60.5 | 67.5 | 57.5 | 46.8 | 61.8 | 62.5 |
| CountBench | 97.8 | 97.2 | 96.1 | 90.6 | 96.1 | 97.8 |
| RefCOCO avg | 90.9 | 92.3 | -- | -- | 92.0 | 92.5 |
| EmbSpatialBench | 84.5 | -- | -- | -- | 84.3 | 84.6 |
| RefSpatialBench | 67.7 | -- | 4.7 | -- | 64.3 | 70.0 |
| VideoMME (w sub.) | 87.0 | 87.5 | -- | 77.7 | 86.6 | 87.7 |
| VideoMMMU | 82.3 | 84.7 | 81.6 | 84.4 | 83.7 | 84.4 |
| MLVU | 85.9 | 86.7 | -- | 81.7 | 86.2 | 86.6 |
| MVBench | 74.6 | 77.6 | -- | 67.2 | 74.6 | 75.5 |
| V* | 93.7 | 95.8 | -- | 67.0 | 90.1 | 94.7 |
| AndroidWorld | 64.2 | -- | -- | -- | -- | 70.3 |
| Mode | Temperature | top_p | top_k | min_p | presence_penalty |
|---|---|---|---|---|---|
| Thinking (general) | 1.0 | 0.95 | 20 | 0.0 | 0.0 |
| Thinking (precise coding/WebDev) | 0.6 | 0.95 | 20 | -- | -- |
| Non-thinking / Instruct | 0.7 | 0.80 | 20 | -- | 1.5 |
enable_thinking: False./think and /nothink from Qwen3).preserve_thinking: True retains reasoning traces from history.@misc{qwen3.6-27b,
title = {{Qwen3.6-27B}: Flagship-Level Coding in a {27B} Dense Model},
author = {{Qwen Team}},
month = {April},
year = {2026},
url = {https://qwen.ai/blog?id=qwen3.6-27b}
}
Base model
Qwen/Qwen3.6-27B