Info on this Version

An mlx-optiq quantized version of nex-agi/Nex-N2-mini, only from 70.2 GB its size was reduced to... 17.8! For static version, there's optiq-mixed variation, 37 GB, but, i added nex-n2.txt where the same user prompt/system instruction are sent to both versions and... honestly, i see no sense in adding 37GB version here, for they are roughtly the same. While the reduction from 70 to 17 is usually expected to have, as an outcome, the death of 'brain' in a model, optiq quantization, albeit aggressiveness, makes local inference with low mem peak as accessible as it has never been! See for yourself. Smart quantization and Open Weights ethos are the future killers of mainstream big tech monoculture monstrosities!

Yes, its 'Need maybe' in reasoning may be slightly annoying, and yet, no parasitic words bleed into the real output for user, as well as the model is adherent to system instructions concerning formatting of the text which you can see from the text file (strict adherence to system prompt is perhaps among my most cherished virtues of qwen models and their derivations, which this model is as well, with ssm layers mutations, if config is to be believed)

Note on mlx-optiq version: I use old version, not 0.2.x they have now, not even 0.1.x, but 0.0.11 in venv. Why? Because monkey-patching --sensitivity flag with 'streaming' option each time i update the package would suck, but what REALLY, like, REALLY sucks, is the way the quantization takes place in the latest version: there is no actual RAM sparing, it can process 14/596 layers in some 14-16 hours and THEN OOM, and this is considered to be optimized for local devices (like mine M1 Pro 64 GB URAM/24 GPU cores)... It is NOT even suboptimal, unlike old, 0.0.11 version's streaming option: less than 5 minutes and you get it done as you see it in this repo. Just an information someone may find useful when trying to quantize with mlx-optiq. Aside from that, its advantages over my two otherwise fav quantizations JANG-Q and turboquant-MLX-full is that it requires no additional import or separate cli args/commands for terminal, just basic mlx_lm.* – i am pissed off by packages that eventually build on mlx_lm yet would require to import their shit to basic functions or call their functions because otherwise their quantized models won't generate, raising the errors 'name_of_method_package' missing! mlx-optiq builds on mlx_lm as well but SMARTLY, hiding all monkey patching into the pipeline so that it is noticeable only if you decide to walk through the .py files in the package library, but no visible actions come to sight otherwise, no latency added! a lesson for everyone who's in it

Downloads last month
159
Safetensors
Model size
35B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mlx-community/Nex-N2-mini-mlx-optiq-static-mixed-3_6bits

Quantized
(51)
this model