How does this style of quant stack up to the existing MXFP4 quant?

#1
by harfum8 - opened

Do you have benchmark data to compare?

MLX Community org

no, its complex to generate one, if you know how to, or have time doing it, do it

Seems like a lot of work to generate all these quants and have no data to back up the thesis. Thanks for contributing them though

MLX Community org
This comment has been hidden (marked as Low Quality)
MLX Community org

but its always known that the higher quant is better -

8-bit quant is better but slower in benchmarks than any lower than 8-bit quant:
which they are:
6-bit, 5-bit, 4-bit, 3-bit, 2-bit, or any lower

6-bit quant is better but slower in benchmarks than any lower than 6-bit quant
which they are:
5-bit, 4-bit, 3-bit, 2-bit, or any lower

5-bit quant is better but slower in benchmarks than any lower than 5-bit quant
which they are:
4-bit, 3-bit, 2-bit, or any lower

4-bit quant is better but slower in benchmarks than any lower than 4-bit quant
which they are:
3-bit, 2-bit, or any lower

3-bit quant is better but slower in benchmarks than any lower than 3-bit quant
which they are:
2-bit, or any lower

2-bit quant is better but slower in benchmarks than any lower than itself-bit quant
which they are:
any lower

I agree, I want q8 for the "thinking" layers if they'll fit on my 512G M3U. Just would love to see benchmark data. What is your favorite model for 512 M3U right now?

MLX Community org

good answer!

FULL-ANSWER:

my favorite model for 512 M3U it would be:

Nex-N2-Pro

their benchmark compared to other models, -- is too fire!!

but its still in its road into i make a version of it into MLX

but it would highly take alot of time than you -- beacuse you have 512GB

i can show you the way how to do it and how to publish it in huggingface for other people......

Good thing that i made is i have made the mlx version of Nex-N2-Mini

you can on that machine run it full without any data-lose...

good thing that i already made is

making a MLX version of Nex-N2-mini

usermma/Nex-N2-mini-mlx-fp16

this is fp16 which means 16-bits without any quantization...

enjoy it.

MLX Community org
Benchmark Nex-N2-mini Nex-N2-Pro GPT-5.5 Opus 4.7 Kimi-K2.6 GLM-5.1 MiniMax M3 DeepSeek-V4-Pro
Agent
BrowseComp 74.1 83.7 84.4 79.8 83.2 79.3 83.5 83.4
GDPval 1402 1585 1769 1753 1481 1535 - 1554
Toolathlon 33.3 51.9 55.6 52.8 50.0 40.7 - 51.8
WildClawBench 47.7 53.5 58.2 62.2 - 48.2 - 43.7
WideSearch 62.0 75.6 - - 80.8 - - -
TAU3 65.9 71.1 - - - 70.6 - -
Coding & SWE
SWE-Bench Pro 50.2 58.8 58.6 64.3 58.6 58.4 59.0 55.4
Terminal-Bench 2.1 60.7 75.3 83.4 69.7 - 58.7 66.0 72.0
DeepSWE 8.0 33.6 70 54 24 18 - 8
SWE-Bench Verified 74.4 80.8 82.9 87.6 80.2 - 80.5 80.6
SWE Atlas QnA 31.5 37.9 45.4 45.2 - - 37.9 -
SWE Atlas RF 30.0 32.9 44.8 48.6 - - - -
SWE Atlas TW 23.3 40.0 42.6 38.2 - - 30.8 -
General & Reasoning
GPQA Diamond 82.6 90.7 93.6 94.2 90.5 86.2 - 90.1
IFEval 89.1 94.0 - - 94.5 94.5 - 91.9
Apex 9.4 36.5 - - 24.0 11.5 - 38.3
MLX Community org

i did not made it, i just copy and paste it from
https://huggingface.co/nex-agi/Nex-N2-Pro

Sign up or log in to comment