chaddy81/Qwen3.6-27b-coder-8bit-mlx

This is a 8-bit quantized MLX conversion of chaddy81/Qwen3.6-27b-coder, for fast local inference on Apple Silicon.

Details

Base model: chaddy81/Qwen3.6-27b-coder
Architecture: Qwen3_5ForConditionalGeneration (qwen3_5) — vision-language model (text + image + video)
Text backbone: 64 layers, hidden 5120, 24 attention heads / 4 KV heads (GQA), vocab 248320, context length 262144 (256K)
Vision tower: 27 layers, hidden 1152
Quantization: 8-bit, 8.501 bits-per-weight (MLX affine), group size 64
Files: 6 safetensors shards, ~27 GB
Converted with: mlx_lm 0.31.1, source dtype bfloat16

Usage

pip install mlx-lm

from mlx_lm import load, generate

model, tokenizer = load("chaddy81/Qwen3.6-27b-coder-8bit-mlx")
prompt = "Write a Python function that returns the nth Fibonacci number."
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
print(generate(model, tokenizer, prompt=text, max_tokens=512, verbose=True))

Or from the CLI:

mlx_lm.generate --model chaddy81/Qwen3.6-27b-coder-8bit-mlx --prompt "Explain async/await in Python."

Quantization variants

Variant	bits-per-weight	Size	Repo
8-bit	8.5	~27 GB	chaddy81/Qwen3.6-27b-coder-8bit-mlx
6-bit	6.5	~20 GB	chaddy81/Qwen3.6-27b-coder-6bit-mlx
4-bit	4.5	~14 GB	chaddy81/Qwen3.6-27b-coder-4bit-mlx

Notes

Conversion emitted a tokenizer regex warning referencing a Mistral discussion; this is a generic tokenizers notice and does not affect the Qwen3.5 architecture or quantized weights. If you observe unexpected tokenization, load the tokenizer with fix_mistral_regex=True.
License inherited from the base model; refer to chaddy81/Qwen3.6-27b-coder.

Downloads last month: 305

Safetensors

Model size

27B params

Tensor type

BF16

U32

MLX

Hardware compatibility

8-bit

Model tree for chaddy81/Qwen3.6-27b-coder-8bit-mlx

Base model

chaddy81/Qwen3.6-27b-coder

Quantized

(3)

this model