Instructions to use joowon-jang/Nex-N2-mini-MLX-VLM-8bit-MTP with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use joowon-jang/Nex-N2-mini-MLX-VLM-8bit-MTP with MLX:
# Make sure mlx-vlm is installed # pip install --upgrade mlx-vlm from mlx_vlm import load, generate from mlx_vlm.prompt_utils import apply_chat_template from mlx_vlm.utils import load_config # Load the model model, processor = load("joowon-jang/Nex-N2-mini-MLX-VLM-8bit-MTP") config = load_config("joowon-jang/Nex-N2-mini-MLX-VLM-8bit-MTP") # Prepare input image = ["http://images.cocodataset.org/val2017/000000039769.jpg"] prompt = "Describe this image." # Apply chat template formatted_prompt = apply_chat_template( processor, config, prompt, num_images=1 ) # Generate output output = generate(model, processor, formatted_prompt, image) print(output) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- Pi
How to use joowon-jang/Nex-N2-mini-MLX-VLM-8bit-MTP with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "joowon-jang/Nex-N2-mini-MLX-VLM-8bit-MTP"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "joowon-jang/Nex-N2-mini-MLX-VLM-8bit-MTP" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use joowon-jang/Nex-N2-mini-MLX-VLM-8bit-MTP with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "joowon-jang/Nex-N2-mini-MLX-VLM-8bit-MTP"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default joowon-jang/Nex-N2-mini-MLX-VLM-8bit-MTP
Run Hermes
hermes
Nex-N2-mini-MLX-VLM-8bit-MTP
Native MLX-VLM 8-bit quantized version of nex-agi/Nex-N2-mini, with a grafted MTP head for oMLX Native MTP speculative decoding.
Summary
- Base model:
nex-agi/Nex-N2-mini - Format: native MLX / MLX-VLM
- Quantization: 8-bit MLX-VLM quantization
- Vision: supported
- MTP: included
- Target runtime: oMLX with Native MTP enabled
- Direct
mlx-vlm.generate: not the supported runtime for this MTP variant
What changed in this version
This repository uses the native MLX-VLM trunk and vision weights from Nex-N2-mini-MLX-VLM-8bit, while the MTP head is grafted from the jedisct1/Nex-N2-mini-mlx-OptiQ-8bit-MTP MTP variant.
This is not the same as jedisct1/Nex-N2-mini-mlx-OptiQ-8bit-MTP:
- This repo keeps the native MLX-VLM layout.
- This repo includes the vision tower / VLM weights.
- The
jedisct1OptiQ MTP repository is text-only. - The MTP head is used only for speculative decoding acceleration.
Important compatibility note
This MTP variant includes language_model.mtp.* weights intended for oMLX Native MTP.
The validated runtime is:
- oMLX
- Native MTP enabled in model settings
Plain mlx-vlm.generate or generic MLX loading paths may fail strict loading or may not use the MTP head correctly, because the preserved language_model.mtp.* tensors are intended for the oMLX Native MTP runtime.
For direct mlx-vlm.generate usage, use the non-MTP variant instead:
joowon-jang/Nex-N2-mini-MLX-VLM-8bit
Quality and behavior
The MTP head is used for speculative decoding. Draft tokens are verified by the main Nex-N2-mini trunk before being accepted, so the MTP head is intended to affect speed rather than final output quality.
In practice, MTP speedups depend on draft acceptance rate. The grafted head tends to help more on normal prose and reasoning, and less on unusual token sequences.
Recommended runtime
Use oMLX and enable Native MTP.
Suggested initial settings:
- Native MTP: enabled
- Max Draft Tokens: 2
- Min Draft Tokens: 1
- Temperature: 0 for benchmarking
- Use the same prompt, context length, and max tokens when comparing against non-MTP variants
Notes
This is not an OptiQ oQ8 sidecar model. The model uses a native MLX-VLM layout with vision_tower.* weights included in the model files.
MTP head attribution:
- MTP head source:
jedisct1/Nex-N2-mini-mlx-OptiQ-8bit-MTP - Original base model:
nex-agi/Nex-N2-mini - Donor MTP lineage described by the source model: Qwen3.5-35B-A3B MTP head grafted onto Nex-N2-mini-compatible dimensions
License
Apache-2.0, following the base model and referenced MTP-head source license.
- Downloads last month
- 315
8-bit
Model tree for joowon-jang/Nex-N2-mini-MLX-VLM-8bit-MTP
Base model
nex-agi/Nex-N2-mini