Qwen3.6-35B-A3B-NVFP4-MTP-GGUF

This repo contains two experimental NVFP4 GGUF quantizations of Jackrong's excellent Qwopus3.6-27-Coder for llama.cpp.
This was quantized using my experimental advanced-gguf-quantizer tool.
This model did not have any imatrix used with it, to better keep with the original model's finetuning done by Jackrong.

This repository contains two NVFP4 variants:

Variant File Best for Notes
TURBO Qwopus3.6-27B-Coder-MTP-NVFP4-TURBO.gguf Max speed More NVFP4. Lower quality metrics.
HQ Qwopus3.6-27B-Coder-MTP-NVFP4-HQ.gguf Better quality More tensors promoted. Slightly slower.

Quality & Speed Results

All PPL/KLD results were measured against the same BF16 wikitest KLD base.

Metric TURBO HQ BF16
Size 15.12 GB 16.98 GB 51 GB
Ppl Ratio 1.0348 1.031 1.000
Mean KLD 0.0414 0.0379 1.000
Same Top p 91.62% 92.03% 100%
pp512 5402.97 tk/s 5104.84 tk/s -
tg128 83.44 tk/s 76.38 tk/s -

Evaluation Results

Further evaluation tests are underway to identify real world performance differences between TURBO and HQ.

Benchmark Samples TURBO HQ
GSM8K - -% -%
HellaSwag - -% -%
HumanEval - -% -%
Downloads last month
30,042
GGUF
Model size
27B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for michaelw9999/Qwopus3.6-27B-Coder-MTP-NVFP4-GGUF

Quantized
(4)
this model