Instructions to use morikomorizz/Nex-N2-Pro-MTP-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use morikomorizz/Nex-N2-Pro-MTP-GGUF with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="morikomorizz/Nex-N2-Pro-MTP-GGUF") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("morikomorizz/Nex-N2-Pro-MTP-GGUF", dtype="auto") - llama-cpp-python
How to use morikomorizz/Nex-N2-Pro-MTP-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="morikomorizz/Nex-N2-Pro-MTP-GGUF", filename="IQ1+/nex-n2-pro-IQ1+-00001-of-00023.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use morikomorizz/Nex-N2-Pro-MTP-GGUF with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf morikomorizz/Nex-N2-Pro-MTP-GGUF:IQ2_XS # Run inference directly in the terminal: llama cli -hf morikomorizz/Nex-N2-Pro-MTP-GGUF:IQ2_XS
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf morikomorizz/Nex-N2-Pro-MTP-GGUF:IQ2_XS # Run inference directly in the terminal: llama cli -hf morikomorizz/Nex-N2-Pro-MTP-GGUF:IQ2_XS
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf morikomorizz/Nex-N2-Pro-MTP-GGUF:IQ2_XS # Run inference directly in the terminal: ./llama-cli -hf morikomorizz/Nex-N2-Pro-MTP-GGUF:IQ2_XS
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf morikomorizz/Nex-N2-Pro-MTP-GGUF:IQ2_XS # Run inference directly in the terminal: ./build/bin/llama-cli -hf morikomorizz/Nex-N2-Pro-MTP-GGUF:IQ2_XS
Use Docker
docker model run hf.co/morikomorizz/Nex-N2-Pro-MTP-GGUF:IQ2_XS
- LM Studio
- Jan
- vLLM
How to use morikomorizz/Nex-N2-Pro-MTP-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "morikomorizz/Nex-N2-Pro-MTP-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "morikomorizz/Nex-N2-Pro-MTP-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/morikomorizz/Nex-N2-Pro-MTP-GGUF:IQ2_XS
- SGLang
How to use morikomorizz/Nex-N2-Pro-MTP-GGUF with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "morikomorizz/Nex-N2-Pro-MTP-GGUF" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "morikomorizz/Nex-N2-Pro-MTP-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "morikomorizz/Nex-N2-Pro-MTP-GGUF" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "morikomorizz/Nex-N2-Pro-MTP-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use morikomorizz/Nex-N2-Pro-MTP-GGUF with Ollama:
ollama run hf.co/morikomorizz/Nex-N2-Pro-MTP-GGUF:IQ2_XS
- Unsloth Studio
How to use morikomorizz/Nex-N2-Pro-MTP-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for morikomorizz/Nex-N2-Pro-MTP-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for morikomorizz/Nex-N2-Pro-MTP-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for morikomorizz/Nex-N2-Pro-MTP-GGUF to start chatting
- Pi
How to use morikomorizz/Nex-N2-Pro-MTP-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf morikomorizz/Nex-N2-Pro-MTP-GGUF:IQ2_XS
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "morikomorizz/Nex-N2-Pro-MTP-GGUF:IQ2_XS" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use morikomorizz/Nex-N2-Pro-MTP-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf morikomorizz/Nex-N2-Pro-MTP-GGUF:IQ2_XS
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default morikomorizz/Nex-N2-Pro-MTP-GGUF:IQ2_XS
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use morikomorizz/Nex-N2-Pro-MTP-GGUF with Docker Model Runner:
docker model run hf.co/morikomorizz/Nex-N2-Pro-MTP-GGUF:IQ2_XS
- Lemonade
How to use morikomorizz/Nex-N2-Pro-MTP-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull morikomorizz/Nex-N2-Pro-MTP-GGUF:IQ2_XS
Run and chat with the model
lemonade run user.Nex-N2-Pro-MTP-GGUF-IQ2_XS
List all available models
lemonade list
Nex-N2-Pro-GGUF
Overview
This repository contains the GGUF quantized files for nex-agi/Nex-N2-Pro.
- Original Model: nex-agi/Nex-N2-Pro
- Architecture: Qwen3.5-397B-A17B
- License: Apache 2.0
- MTP Support: MTP Donor-unsloth/Qwen3.5-397B-A17B-MTP-GGUF
| Quant Type | Size | Description |
|---|---|---|
| IQ1+ | 100 GB | Mixed Precision for Better Quality |
| IQ2_XS | 142 GB | Mixed Precision for Better Quality |
| Q2_K | 158 GB | Standar llama.cpp quantization |
An agentic model with Agentic Thinking.
Today, we are officially releasing and open-sourcing our next-generation model, Nex-N2 — an agent model built for real-world productivity scenarios. With first-tier coding and agentic capabilities, Nex-N2 keeps driving complex, long-horizon tasks forward in real environments to deliver stable, end-to-end results.
Over the past year, a paradigm shift led by Vibe Coding and Harness Engineering has been redefining the limits of LLM agents. From dialogue, to reasoning, to agents that execute long-horizon tasks with environmental feedback, the tasks models must handle keep growing harder, the contexts longer, and the environments more realistic. The core of next-generation model competition is no longer whether a model can think, but whether it can reliably and efficiently turn thinking into actions that are executable, verifiable, and iterable.
Rather than treating reasoning, tool use, and environment execution as separate capabilities, Nex-N2 unifies them through an Agentic Thinking framework that connects requirement understanding, task planning, code implementation, environmental feedback, evaluation and debugging, and continuous iteration into a single closed loop. The framework has two parts:
- Adaptive Thinking lets the model decide on its own when to think and how deeply — executing simple actions quickly while reasoning thoroughly on critical decisions.
- Coherent Thinking carries one consistent reasoning paradigm across general reasoning and diverse agentic tasks, staying consistent across tasks and modalities to enable stable capability transfer.
Across real agentic workflows — agentic coding, deep research, tool calling, and terminal execution — Nex-N2 reaches first-tier performance, with substantial gains over the previous-generation Nex-N1 on multiple authoritative benchmarks. In real productivity scenarios such as OpenClaw one-person-company workflows, end-to-end game development, and web and multimodal generation, it likewise demonstrates outstanding usability, robustness, and stability.
Performance
| Benchmark | Nex-N2-mini | Nex-N2-Pro | GPT-5.5 | Opus 4.7 | Kimi-K2.6 | GLM-5.1 | MiniMax M3 | DeepSeek-V4-Pro |
|---|---|---|---|---|---|---|---|---|
| Agent | ||||||||
| BrowseComp | 74.1 | 83.7 | 84.4 | 79.8 | 83.2 | 79.3 | 83.5 | 83.4 |
| GDPval | 1402 | 1585 | 1769 | 1753 | 1481 | 1535 | - | 1554 |
| Toolathlon | 33.3 | 51.9 | 55.6 | 52.8 | 50.0 | 40.7 | - | 51.8 |
| WildClawBench | 47.7 | 53.5 | 58.2 | 62.2 | - | 48.2 | - | 43.7 |
| WideSearch | 62.0 | 75.6 | - | - | 80.8 | - | - | - |
| TAU3 | 65.9 | 71.1 | - | - | - | 70.6 | - | - |
| Coding & SWE | ||||||||
| SWE-Bench Pro | 50.2 | 58.8 | 58.6 | 64.3 | 58.6 | 58.4 | 59.0 | 55.4 |
| Terminal-Bench 2.1 | 60.7 | 75.3 | 83.4 | 69.7 | - | 58.7 | 66.0 | 72.0 |
| DeepSWE | 8.0 | 33.6 | 70 | 54 | 24 | 18 | - | 8 |
| SWE-Bench Verified | 74.4 | 80.8 | 82.9 | 87.6 | 80.2 | - | 80.5 | 80.6 |
| SWE Atlas QnA | 31.5 | 37.9 | 45.4 | 45.2 | - | - | 37.9 | - |
| SWE Atlas RF | 30.0 | 32.9 | 44.8 | 48.6 | - | - | - | - |
| SWE Atlas TW | 23.3 | 40.0 | 42.6 | 38.2 | - | - | 30.8 | - |
| General & Reasoning | ||||||||
| GPQA Diamond | 82.6 | 90.7 | 93.6 | 94.2 | 90.5 | 86.2 | - | 90.1 |
| IFEval | 89.1 | 94.0 | - | - | 94.5 | 94.5 | - | 91.9 |
| Apex | 9.4 | 36.5 | - | - | 24.0 | 11.5 | - | 38.3 |
How to Use
These GGUF files are fully compatible with llama.cpp and popular graphical interfaces like LM Studio, Ollama.
Example using llama.cpp CLI:
./llama-cli -m nex-n2-pro-Q2_K-00001-of-00023.gguf \
-p "Hello, how are you?" \
-sys "You are a helpful AI" \
-n 4096 \
-c 8192
- Downloads last month
- 4,487
2-bit
Model tree for morikomorizz/Nex-N2-Pro-MTP-GGUF
Base model
nex-agi/Nex-N2-Pro