Nex-N2-Pro-397B-A17B-EXL3 Pareto Frontier
Pareto-frontier EXL3 quants for Nex N2 Pro 397B.
Pick a quant with lowest KL and ppl that suits your hardware. Quants are in separate model repositories.
| Quant | GiB | GB | bpw | PPL | KL(qโo) | KL(oโq) | Top-1 | Top-2 | Top-3 | Top-4 | Top-5 |
|---|---|---|---|---|---|---|---|---|---|---|---|
| cpral 2bpw | 97 | 106 | 2.04 | 3.623 | 0.1682 | 0.2205 | 87.1% | 59.0% | 34.2% | 18.4% | 9.5% |
| cpral 2.65bpw | 124 | 134 | 2.65 | 3.342 | 0.0685 | 0.0749 | 92.3% | 71.7% | 49.7% | 32.0% | 19.7% |
| cpral 3bpw | 144 | 155 | 3.04 | 3.334 | 0.0491 | 0.0522 | 93.4% | 75.1% | 54.2% | 36.3% | 23.1% |
| cpral 3.2bpw | 151 | 162 | 3.20 | 3.313 | 0.0396 | 0.0418 | 94.3% | 77.5% | 57.7% | 40.0% | 26.6% |
| cpral 3.4bpw | 163 | 174 | 3.40 | 3.299 | 0.0282 | 0.0293 | 95.1% | 80.3% | 61.8% | 44.6% | 30.8% |
| cpral 3.55bpw | 168 | 180 | 3.55 | 3.288 | 0.0235 | 0.0239 | 95.5% | 81.9% | 64.4% | 47.6% | 33.6% |
| cpral 3.7bpw | 177 | 189 | 3.70 | 3.292 | 0.0216 | 0.0221 | 95.7% | 82.5% | 65.4% | 48.7% | 34.7% |
| cpral 4bpw | 192 | 204 | 4.04 | 3.293 | 0.0188 | 0.0193 | 95.9% | 83.4% | 66.9% | 50.5% | 36.5% |
| cpral 4.5bpw | 213 | 226 | 4.50 | 3.281 | 0.0126 | 0.0129 | 96.7% | 86.0% | 71.3% | 55.9% | 42.0% |
| cpral 5bpw | 240 | 254 | 5.04 | 3.282 | 0.0096 | 0.0097 | 97.1% | 87.6% | 74.3% | 59.9% | 46.3% |
| BF16 original | 752 | 807 | 16.00 | 3.271 | โ | โ | โ | โ | โ | โ |
Methodology
Methodology that I've used to create custom quants is documented in https://github.com/adamo1139/qwen397b-exl3 and is mostly reproducible (I may have manually overriden some auto-generated configs in a minor way). Custom override configs have been placed into model repositories of all quants produced using this method. I've reused Goldkoron per-module sensitivity from Qwen 3.5 397B in hopes that this model would have similar sensivity, but it seems to not quite be the case, as custom optimized quants do not outperform bigger baselines and are just an intermediate step between them.
Credits
Thanks to @mratsim for sharing his methodology. Thanks to @Goldkoron for sharing per-module KLD sensivity chart for Qwen 3.5 397B. Thanks to @turboderp for creating exllamav3.
Potential for future work
Future work could enable better quants by re-doing sensivity analysis for Nex N2 Pro, tweaking superlinear penalty, incorporating 6bpw and 7bpw baselines, and quantizing various experts to a variable degree as informed by REAP/REAM data. Methodology that was used here should be applicable for other MoE models like GLM 4.5 family and Qwen 3.5 122B too, and it might be applicable to GGUF ecosystem too
TODO
Quantization_bits metadata is misleading and is a constant 3.00 since it was just copied over from one of the baseline quants. head_bits might be incorrect in the same way.
Model tree for cpral/Nex-N2-Pro-397B-A17B-exl3
Base model
nex-agi/Nex-N2-Pro