Instructions to use morikomorizz/Nex-N2-Pro-MTP-GGUF-Experimental with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use morikomorizz/Nex-N2-Pro-MTP-GGUF-Experimental with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="morikomorizz/Nex-N2-Pro-MTP-GGUF-Experimental") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("morikomorizz/Nex-N2-Pro-MTP-GGUF-Experimental", dtype="auto") - llama-cpp-python
How to use morikomorizz/Nex-N2-Pro-MTP-GGUF-Experimental with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="morikomorizz/Nex-N2-Pro-MTP-GGUF-Experimental", filename="BF16/nex-n2-pro.bf16-00001-of-00023.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use morikomorizz/Nex-N2-Pro-MTP-GGUF-Experimental with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf morikomorizz/Nex-N2-Pro-MTP-GGUF-Experimental:BF16 # Run inference directly in the terminal: llama-cli -hf morikomorizz/Nex-N2-Pro-MTP-GGUF-Experimental:BF16
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf morikomorizz/Nex-N2-Pro-MTP-GGUF-Experimental:BF16 # Run inference directly in the terminal: llama-cli -hf morikomorizz/Nex-N2-Pro-MTP-GGUF-Experimental:BF16
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf morikomorizz/Nex-N2-Pro-MTP-GGUF-Experimental:BF16 # Run inference directly in the terminal: ./llama-cli -hf morikomorizz/Nex-N2-Pro-MTP-GGUF-Experimental:BF16
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf morikomorizz/Nex-N2-Pro-MTP-GGUF-Experimental:BF16 # Run inference directly in the terminal: ./build/bin/llama-cli -hf morikomorizz/Nex-N2-Pro-MTP-GGUF-Experimental:BF16
Use Docker
docker model run hf.co/morikomorizz/Nex-N2-Pro-MTP-GGUF-Experimental:BF16
- LM Studio
- Jan
- vLLM
How to use morikomorizz/Nex-N2-Pro-MTP-GGUF-Experimental with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "morikomorizz/Nex-N2-Pro-MTP-GGUF-Experimental" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "morikomorizz/Nex-N2-Pro-MTP-GGUF-Experimental", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/morikomorizz/Nex-N2-Pro-MTP-GGUF-Experimental:BF16
- SGLang
How to use morikomorizz/Nex-N2-Pro-MTP-GGUF-Experimental with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "morikomorizz/Nex-N2-Pro-MTP-GGUF-Experimental" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "morikomorizz/Nex-N2-Pro-MTP-GGUF-Experimental", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "morikomorizz/Nex-N2-Pro-MTP-GGUF-Experimental" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "morikomorizz/Nex-N2-Pro-MTP-GGUF-Experimental", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use morikomorizz/Nex-N2-Pro-MTP-GGUF-Experimental with Ollama:
ollama run hf.co/morikomorizz/Nex-N2-Pro-MTP-GGUF-Experimental:BF16
- Unsloth Studio
How to use morikomorizz/Nex-N2-Pro-MTP-GGUF-Experimental with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for morikomorizz/Nex-N2-Pro-MTP-GGUF-Experimental to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for morikomorizz/Nex-N2-Pro-MTP-GGUF-Experimental to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for morikomorizz/Nex-N2-Pro-MTP-GGUF-Experimental to start chatting
- Pi
How to use morikomorizz/Nex-N2-Pro-MTP-GGUF-Experimental with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf morikomorizz/Nex-N2-Pro-MTP-GGUF-Experimental:BF16
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "morikomorizz/Nex-N2-Pro-MTP-GGUF-Experimental:BF16" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use morikomorizz/Nex-N2-Pro-MTP-GGUF-Experimental with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf morikomorizz/Nex-N2-Pro-MTP-GGUF-Experimental:BF16
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default morikomorizz/Nex-N2-Pro-MTP-GGUF-Experimental:BF16
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use morikomorizz/Nex-N2-Pro-MTP-GGUF-Experimental with Docker Model Runner:
docker model run hf.co/morikomorizz/Nex-N2-Pro-MTP-GGUF-Experimental:BF16
- Lemonade
How to use morikomorizz/Nex-N2-Pro-MTP-GGUF-Experimental with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull morikomorizz/Nex-N2-Pro-MTP-GGUF-Experimental:BF16
Run and chat with the model
lemonade run user.Nex-N2-Pro-MTP-GGUF-Experimental-BF16
List all available models
lemonade list
Nex-N2-Pro-GGUF
Overview
This repository contains the GGUF quantized files for nex-agi/Nex-N2-Pro.
- Original Model: nex-agi/Nex-N2-Pro
- Architecture: Qwen3.5-397B-A17B
- License: Apache 2.0
- MTP Support: MTP Donor-unsloth/Qwen3.5-397B-A17B-MTP-GGUF
An agentic model with Agentic Thinking.
Today, we are officially releasing and open-sourcing our next-generation model, Nex-N2 — an agent model built for real-world productivity scenarios. With first-tier coding and agentic capabilities, Nex-N2 keeps driving complex, long-horizon tasks forward in real environments to deliver stable, end-to-end results.
Over the past year, a paradigm shift led by Vibe Coding and Harness Engineering has been redefining the limits of LLM agents. From dialogue, to reasoning, to agents that execute long-horizon tasks with environmental feedback, the tasks models must handle keep growing harder, the contexts longer, and the environments more realistic. The core of next-generation model competition is no longer whether a model can think, but whether it can reliably and efficiently turn thinking into actions that are executable, verifiable, and iterable.
Rather than treating reasoning, tool use, and environment execution as separate capabilities, Nex-N2 unifies them through an Agentic Thinking framework that connects requirement understanding, task planning, code implementation, environmental feedback, evaluation and debugging, and continuous iteration into a single closed loop. The framework has two parts:
- Adaptive Thinking lets the model decide on its own when to think and how deeply — executing simple actions quickly while reasoning thoroughly on critical decisions.
- Coherent Thinking carries one consistent reasoning paradigm across general reasoning and diverse agentic tasks, staying consistent across tasks and modalities to enable stable capability transfer.
Across real agentic workflows — agentic coding, deep research, tool calling, and terminal execution — Nex-N2 reaches first-tier performance, with substantial gains over the previous-generation Nex-N1 on multiple authoritative benchmarks. In real productivity scenarios such as OpenClaw one-person-company workflows, end-to-end game development, and web and multimodal generation, it likewise demonstrates outstanding usability, robustness, and stability.
Performance
| Benchmark | Nex-N2-mini | Nex-N2-Pro | GPT-5.5 | Opus 4.7 | Kimi-K2.6 | GLM-5.1 | MiniMax M3 | DeepSeek-V4-Pro |
|---|---|---|---|---|---|---|---|---|
| Agent | ||||||||
| BrowseComp | 74.1 | 83.7 | 84.4 | 79.8 | 83.2 | 79.3 | 83.5 | 83.4 |
| GDPval | 1402 | 1585 | 1769 | 1753 | 1481 | 1535 | - | 1554 |
| Toolathlon | 33.3 | 51.9 | 55.6 | 52.8 | 50.0 | 40.7 | - | 51.8 |
| WildClawBench | 47.7 | 53.5 | 58.2 | 62.2 | - | 48.2 | - | 43.7 |
| WideSearch | 62.0 | 75.6 | - | - | 80.8 | - | - | - |
| TAU3 | 65.9 | 71.1 | - | - | - | 70.6 | - | - |
| Coding & SWE | ||||||||
| SWE-Bench Pro | 50.2 | 58.8 | 58.6 | 64.3 | 58.6 | 58.4 | 59.0 | 55.4 |
| Terminal-Bench 2.1 | 60.7 | 75.3 | 83.4 | 69.7 | - | 58.7 | 66.0 | 72.0 |
| DeepSWE | 8.0 | 33.6 | 70 | 54 | 24 | 18 | - | 8 |
| SWE-Bench Verified | 74.4 | 80.8 | 82.9 | 87.6 | 80.2 | - | 80.5 | 80.6 |
| SWE Atlas QnA | 31.5 | 37.9 | 45.4 | 45.2 | - | - | 37.9 | - |
| SWE Atlas RF | 30.0 | 32.9 | 44.8 | 48.6 | - | - | - | - |
| SWE Atlas TW | 23.3 | 40.0 | 42.6 | 38.2 | - | - | 30.8 | - |
| General & Reasoning | ||||||||
| GPQA Diamond | 82.6 | 90.7 | 93.6 | 94.2 | 90.5 | 86.2 | - | 90.1 |
| IFEval | 89.1 | 94.0 | - | - | 94.5 | 94.5 | - | 91.9 |
| Apex | 9.4 | 36.5 | - | - | 24.0 | 11.5 | - | 38.3 |
How to Use
These GGUF files are fully compatible with llama.cpp and popular graphical interfaces like LM Studio, Ollama.
Example using llama.cpp CLI:
./llama-cli -m nex-n2-pro-Q2_K-00001-of-00023.gguf \
-p "Hello, how are you?" \
-sys "You are a helpful AI" \
-n 4096 \
-c 8192
- Downloads last month
- 1,362
Model tree for morikomorizz/Nex-N2-Pro-MTP-GGUF-Experimental
Base model
nex-agi/Nex-N2-Pro