Instructions to use morikomorizz/Nex-N2-Pro-MTP-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use morikomorizz/Nex-N2-Pro-MTP-GGUF with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="morikomorizz/Nex-N2-Pro-MTP-GGUF")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("morikomorizz/Nex-N2-Pro-MTP-GGUF", dtype="auto")

llama-cpp-python

How to use morikomorizz/Nex-N2-Pro-MTP-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="morikomorizz/Nex-N2-Pro-MTP-GGUF",
	filename="IQ1+/nex-n2-pro-IQ1+-00001-of-00023.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use morikomorizz/Nex-N2-Pro-MTP-GGUF with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf morikomorizz/Nex-N2-Pro-MTP-GGUF:IQ2_XS
# Run inference directly in the terminal:
llama cli -hf morikomorizz/Nex-N2-Pro-MTP-GGUF:IQ2_XS

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf morikomorizz/Nex-N2-Pro-MTP-GGUF:IQ2_XS
# Run inference directly in the terminal:
llama cli -hf morikomorizz/Nex-N2-Pro-MTP-GGUF:IQ2_XS

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf morikomorizz/Nex-N2-Pro-MTP-GGUF:IQ2_XS
# Run inference directly in the terminal:
./llama-cli -hf morikomorizz/Nex-N2-Pro-MTP-GGUF:IQ2_XS

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf morikomorizz/Nex-N2-Pro-MTP-GGUF:IQ2_XS
# Run inference directly in the terminal:
./build/bin/llama-cli -hf morikomorizz/Nex-N2-Pro-MTP-GGUF:IQ2_XS

Use Docker

docker model run hf.co/morikomorizz/Nex-N2-Pro-MTP-GGUF:IQ2_XS

LM Studio
Jan

vLLM

How to use morikomorizz/Nex-N2-Pro-MTP-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "morikomorizz/Nex-N2-Pro-MTP-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "morikomorizz/Nex-N2-Pro-MTP-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/morikomorizz/Nex-N2-Pro-MTP-GGUF:IQ2_XS

SGLang

How to use morikomorizz/Nex-N2-Pro-MTP-GGUF with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "morikomorizz/Nex-N2-Pro-MTP-GGUF" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "morikomorizz/Nex-N2-Pro-MTP-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "morikomorizz/Nex-N2-Pro-MTP-GGUF" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "morikomorizz/Nex-N2-Pro-MTP-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use morikomorizz/Nex-N2-Pro-MTP-GGUF with Ollama:
```
ollama run hf.co/morikomorizz/Nex-N2-Pro-MTP-GGUF:IQ2_XS
```

Unsloth Studio

How to use morikomorizz/Nex-N2-Pro-MTP-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for morikomorizz/Nex-N2-Pro-MTP-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for morikomorizz/Nex-N2-Pro-MTP-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for morikomorizz/Nex-N2-Pro-MTP-GGUF to start chatting

How to use morikomorizz/Nex-N2-Pro-MTP-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf morikomorizz/Nex-N2-Pro-MTP-GGUF:IQ2_XS

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "morikomorizz/Nex-N2-Pro-MTP-GGUF:IQ2_XS"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use morikomorizz/Nex-N2-Pro-MTP-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf morikomorizz/Nex-N2-Pro-MTP-GGUF:IQ2_XS

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default morikomorizz/Nex-N2-Pro-MTP-GGUF:IQ2_XS

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use morikomorizz/Nex-N2-Pro-MTP-GGUF with Docker Model Runner:
```
docker model run hf.co/morikomorizz/Nex-N2-Pro-MTP-GGUF:IQ2_XS
```

Lemonade

How to use morikomorizz/Nex-N2-Pro-MTP-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull morikomorizz/Nex-N2-Pro-MTP-GGUF:IQ2_XS

Run and chat with the model

lemonade run user.Nex-N2-Pro-MTP-GGUF-IQ2_XS

List all available models

lemonade list

Nex-N2-Pro-GGUF

Overview

This repository contains the GGUF quantized files for nex-agi/Nex-N2-Pro.

Original Model: nex-agi/Nex-N2-Pro
Architecture: Qwen3.5-397B-A17B
License: Apache 2.0
MTP Support: MTP Donor-unsloth/Qwen3.5-397B-A17B-MTP-GGUF

Quant Type	Size	Description
IQ1+	100 GB	Mixed Precision for Better Quality
IQ2_XS	142 GB	Mixed Precision for Better Quality
Q2_K	158 GB	Standar llama.cpp quantization

An agentic model with Agentic Thinking.

Today, we are officially releasing and open-sourcing our next-generation model, Nex-N2 — an agent model built for real-world productivity scenarios. With first-tier coding and agentic capabilities, Nex-N2 keeps driving complex, long-horizon tasks forward in real environments to deliver stable, end-to-end results.

Over the past year, a paradigm shift led by Vibe Coding and Harness Engineering has been redefining the limits of LLM agents. From dialogue, to reasoning, to agents that execute long-horizon tasks with environmental feedback, the tasks models must handle keep growing harder, the contexts longer, and the environments more realistic. The core of next-generation model competition is no longer whether a model can think, but whether it can reliably and efficiently turn thinking into actions that are executable, verifiable, and iterable.

Rather than treating reasoning, tool use, and environment execution as separate capabilities, Nex-N2 unifies them through an Agentic Thinking framework that connects requirement understanding, task planning, code implementation, environmental feedback, evaluation and debugging, and continuous iteration into a single closed loop. The framework has two parts:

Adaptive Thinking lets the model decide on its own when to think and how deeply — executing simple actions quickly while reasoning thoroughly on critical decisions.
Coherent Thinking carries one consistent reasoning paradigm across general reasoning and diverse agentic tasks, staying consistent across tasks and modalities to enable stable capability transfer.

Across real agentic workflows — agentic coding, deep research, tool calling, and terminal execution — Nex-N2 reaches first-tier performance, with substantial gains over the previous-generation Nex-N1 on multiple authoritative benchmarks. In real productivity scenarios such as OpenClaw one-person-company workflows, end-to-end game development, and web and multimodal generation, it likewise demonstrates outstanding usability, robustness, and stability.

Performance

Benchmark	Nex-N2-mini	Nex-N2-Pro	GPT-5.5	Opus 4.7	Kimi-K2.6	GLM-5.1	MiniMax M3	DeepSeek-V4-Pro
Agent
BrowseComp	74.1	83.7	84.4	79.8	83.2	79.3	83.5	83.4
GDPval	1402	1585	1769	1753	1481	1535	-	1554
Toolathlon	33.3	51.9	55.6	52.8	50.0	40.7	-	51.8
WildClawBench	47.7	53.5	58.2	62.2	-	48.2	-	43.7
WideSearch	62.0	75.6	-	-	80.8	-	-	-
TAU3	65.9	71.1	-	-	-	70.6	-	-
Coding & SWE
SWE-Bench Pro	50.2	58.8	58.6	64.3	58.6	58.4	59.0	55.4
Terminal-Bench 2.1	60.7	75.3	83.4	69.7	-	58.7	66.0	72.0
DeepSWE	8.0	33.6	70	54	24	18	-	8
SWE-Bench Verified	74.4	80.8	82.9	87.6	80.2	-	80.5	80.6
SWE Atlas QnA	31.5	37.9	45.4	45.2	-	-	37.9	-
SWE Atlas RF	30.0	32.9	44.8	48.6	-	-	-	-
SWE Atlas TW	23.3	40.0	42.6	38.2	-	-	30.8	-
General & Reasoning
GPQA Diamond	82.6	90.7	93.6	94.2	90.5	86.2	-	90.1
IFEval	89.1	94.0	-	-	94.5	94.5	-	91.9
Apex	9.4	36.5	-	-	24.0	11.5	-	38.3

How to Use

These GGUF files are fully compatible with llama.cpp and popular graphical interfaces like LM Studio, Ollama.

Example using `llama.cpp` CLI:

./llama-cli -m nex-n2-pro-Q2_K-00001-of-00023.gguf \
  -p "Hello, how are you?" \
  -sys "You are a helpful AI" \
  -n 4096 \
  -c 8192

Downloads last month: 4,487

GGUF

Model size

403B params

Architecture

qwen35moe

Hardware compatibility

2-bit

View +2 variants

Model tree for morikomorizz/Nex-N2-Pro-MTP-GGUF

Base model

nex-agi/Nex-N2-Pro

Quantized

(29)

this model

Collection including morikomorizz/Nex-N2-Pro-MTP-GGUF

GGUF Collections

Collection

All-GGUF-Repository • 6 items • Updated about 8 hours ago