Nex-N2-mini-MLX-VLM-8bit-MTP

Native MLX-VLM 8-bit quantized version of nex-agi/Nex-N2-mini, with a grafted MTP head for oMLX Native MTP speculative decoding.

Summary

Base model: nex-agi/Nex-N2-mini
Format: native MLX / MLX-VLM
Quantization: 8-bit MLX-VLM quantization
Vision: supported
MTP: included
Target runtime: oMLX with Native MTP enabled
Direct mlx-vlm.generate: not the supported runtime for this MTP variant

What changed in this version

This repository uses the native MLX-VLM trunk and vision weights from Nex-N2-mini-MLX-VLM-8bit, while the MTP head is grafted from the jedisct1/Nex-N2-mini-mlx-OptiQ-8bit-MTP MTP variant.

This is not the same as jedisct1/Nex-N2-mini-mlx-OptiQ-8bit-MTP:

This repo keeps the native MLX-VLM layout.
This repo includes the vision tower / VLM weights.
The jedisct1 OptiQ MTP repository is text-only.
The MTP head is used only for speculative decoding acceleration.

Important compatibility note

This MTP variant includes language_model.mtp.* weights intended for oMLX Native MTP.

The validated runtime is:

oMLX
Native MTP enabled in model settings

Plain mlx-vlm.generate or generic MLX loading paths may fail strict loading or may not use the MTP head correctly, because the preserved language_model.mtp.* tensors are intended for the oMLX Native MTP runtime.

For direct mlx-vlm.generate usage, use the non-MTP variant instead:

joowon-jang/Nex-N2-mini-MLX-VLM-8bit

Quality and behavior

The MTP head is used for speculative decoding. Draft tokens are verified by the main Nex-N2-mini trunk before being accepted, so the MTP head is intended to affect speed rather than final output quality.

In practice, MTP speedups depend on draft acceptance rate. The grafted head tends to help more on normal prose and reasoning, and less on unusual token sequences.

Recommended runtime

Use oMLX and enable Native MTP.

Suggested initial settings:

Native MTP: enabled
Max Draft Tokens: 2
Min Draft Tokens: 1
Temperature: 0 for benchmarking
Use the same prompt, context length, and max tokens when comparing against non-MTP variants

Notes

This is not an OptiQ oQ8 sidecar model. The model uses a native MLX-VLM layout with vision_tower.* weights included in the model files.

MTP head attribution:

MTP head source: jedisct1/Nex-N2-mini-mlx-OptiQ-8bit-MTP
Original base model: nex-agi/Nex-N2-mini
Donor MTP lineage described by the source model: Qwen3.5-35B-A3B MTP head grafted onto Nex-N2-mini-compatible dimensions

License

Apache-2.0, following the base model and referenced MTP-head source license.

Downloads last month: 315

Safetensors

Model size

10B params

Tensor type

BF16

U32

MLX

Hardware compatibility

8-bit

Model tree for joowon-jang/Nex-N2-mini-MLX-VLM-8bit-MTP

Base model

nex-agi/Nex-N2-mini

Quantized

(52)

this model

Collection including joowon-jang/Nex-N2-mini-MLX-VLM-8bit-MTP

Nex-N2-mini MLX-VLM Quantized

Collection

5 items • Updated 8 days ago • 2