Nex-N2-Pro-397B-A17B-EXL3 Pareto Frontier

Pareto-frontier EXL3 quants for Nex N2 Pro 397B.

Pick a quant with lowest KL and ppl that suits your hardware. Quants are in separate model repositories.

Quant	GiB	GB	bpw	PPL	KL(q→o)	KL(o→q)	Top-1	Top-2	Top-3	Top-4	Top-5
cpral 2bpw	97	106	2.04	3.623	0.1682	0.2205	87.1%	59.0%	34.2%	18.4%	9.5%
cpral 2.65bpw	124	134	2.65	3.342	0.0685	0.0749	92.3%	71.7%	49.7%	32.0%	19.7%
cpral 3bpw	144	155	3.04	3.334	0.0491	0.0522	93.4%	75.1%	54.2%	36.3%	23.1%
cpral 3.2bpw	151	162	3.20	3.313	0.0396	0.0418	94.3%	77.5%	57.7%	40.0%	26.6%
cpral 3.4bpw	163	174	3.40	3.299	0.0282	0.0293	95.1%	80.3%	61.8%	44.6%	30.8%
cpral 3.55bpw	168	180	3.55	3.288	0.0235	0.0239	95.5%	81.9%	64.4%	47.6%	33.6%
cpral 3.7bpw	177	189	3.70	3.292	0.0216	0.0221	95.7%	82.5%	65.4%	48.7%	34.7%
cpral 4bpw	192	204	4.04	3.293	0.0188	0.0193	95.9%	83.4%	66.9%	50.5%	36.5%
cpral 4.5bpw	213	226	4.50	3.281	0.0126	0.0129	96.7%	86.0%	71.3%	55.9%	42.0%
cpral 5bpw	240	254	5.04	3.282	0.0096	0.0097	97.1%	87.6%	74.3%	59.9%	46.3%
BF16 original	752	807	16.00	3.271	—	—	—	—	—	—

Methodology

Methodology that I've used to create custom quants is documented in https://github.com/adamo1139/qwen397b-exl3 and is mostly reproducible (I may have manually overriden some auto-generated configs in a minor way). Custom override configs have been placed into model repositories of all quants produced using this method. I've reused Goldkoron per-module sensitivity from Qwen 3.5 397B in hopes that this model would have similar sensivity, but it seems to not quite be the case, as custom optimized quants do not outperform bigger baselines and are just an intermediate step between them.

Credits

Thanks to @mratsim for sharing his methodology. Thanks to @Goldkoron for sharing per-module KLD sensivity chart for Qwen 3.5 397B. Thanks to @turboderp for creating exllamav3.

Potential for future work

Future work could enable better quants by re-doing sensivity analysis for Nex N2 Pro, tweaking superlinear penalty, incorporating 6bpw and 7bpw baselines, and quantizing various experts to a variable degree as informed by REAP/REAM data. Methodology that was used here should be applicable for other MoE models like GLM 4.5 family and Qwen 3.5 122B too, and it might be applicable to GGUF ecosystem too

TODO

Quantization_bits metadata is misleading and is a constant 3.00 since it was just copied over from one of the baseline quants. head_bits might be incorrect in the same way.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Image-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for cpral/Nex-N2-Pro-397B-A17B-exl3

Base model

nex-agi/Nex-N2-Pro

Quantized

(21)

this model