Nex-N2-mini, 8-bit MLX

This is nex-agi/Nex-N2-mini converted to MLX format and quantized to 8 bits (group size 64) with mlx-lm 0.31.3.

Nex-N2-mini is an agentic model built around what its authors call Agentic Thinking: it interleaves reasoning, tool use, and environment feedback rather than treating them as separate stages. The architecture is a hybrid MoE (qwen3_5_moe): 40 layers alternating linear attention with full attention every fourth layer, 256 experts with 8 active per token, and a 262k-token context window.

The original checkpoint includes a vision tower. MLX text inference does not use it, so the vision weights were dropped during conversion; this copy is text-only. Expect roughly 37 GB of memory in use during inference.

Usage

With mlx-lm, either directly:

mlx_lm.generate --model jedisct1/Nex-N2-mini-mlx-8bit --prompt "Hello"

or as an OpenAI-compatible server:

mlx_lm.server --model jedisct1/Nex-N2-mini-mlx-8bit

It also works out of the box with oMLX.

Tool calling works without any extra configuration. The chat template uses the Qwen3-Coder XML style, which mlx-lm and oMLX both detect automatically, so servers return proper structured tool_calls, and thinking ends up in the reasoning field instead of leaking into the response content. Tested end to end with Swival as the harness, including multi-step tasks that exercise file edits, search, and shell commands while the model is thinking.

Downloads last month
53
Safetensors
Model size
35B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jedisct1/Nex-N2-mini-mlx-8bit

Quantized
(45)
this model

Collection including jedisct1/Nex-N2-mini-mlx-8bit