Instructions to use SC117/Nex-N2-mini-template-fix-APEX-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use SC117/Nex-N2-mini-template-fix-APEX-GGUF with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="SC117/Nex-N2-mini-template-fix-APEX-GGUF")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("SC117/Nex-N2-mini-template-fix-APEX-GGUF", dtype="auto")

llama-cpp-python

How to use SC117/Nex-N2-mini-template-fix-APEX-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="SC117/Nex-N2-mini-template-fix-APEX-GGUF",
	filename="Nex-N2-mini-APEX-I-Balanced.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": [
				{
					"type": "text",
					"text": "Describe this image in one sentence."
				},
				{
					"type": "image_url",
					"image_url": {
						"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
					}
				}
			]
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use SC117/Nex-N2-mini-template-fix-APEX-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf SC117/Nex-N2-mini-template-fix-APEX-GGUF:BF16
# Run inference directly in the terminal:
llama-cli -hf SC117/Nex-N2-mini-template-fix-APEX-GGUF:BF16

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf SC117/Nex-N2-mini-template-fix-APEX-GGUF:BF16
# Run inference directly in the terminal:
llama-cli -hf SC117/Nex-N2-mini-template-fix-APEX-GGUF:BF16

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf SC117/Nex-N2-mini-template-fix-APEX-GGUF:BF16
# Run inference directly in the terminal:
./llama-cli -hf SC117/Nex-N2-mini-template-fix-APEX-GGUF:BF16

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf SC117/Nex-N2-mini-template-fix-APEX-GGUF:BF16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf SC117/Nex-N2-mini-template-fix-APEX-GGUF:BF16

Use Docker

docker model run hf.co/SC117/Nex-N2-mini-template-fix-APEX-GGUF:BF16

LM Studio
Jan

vLLM

How to use SC117/Nex-N2-mini-template-fix-APEX-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "SC117/Nex-N2-mini-template-fix-APEX-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "SC117/Nex-N2-mini-template-fix-APEX-GGUF",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/SC117/Nex-N2-mini-template-fix-APEX-GGUF:BF16

SGLang

How to use SC117/Nex-N2-mini-template-fix-APEX-GGUF with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "SC117/Nex-N2-mini-template-fix-APEX-GGUF" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "SC117/Nex-N2-mini-template-fix-APEX-GGUF",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "SC117/Nex-N2-mini-template-fix-APEX-GGUF" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "SC117/Nex-N2-mini-template-fix-APEX-GGUF",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Ollama
How to use SC117/Nex-N2-mini-template-fix-APEX-GGUF with Ollama:
```
ollama run hf.co/SC117/Nex-N2-mini-template-fix-APEX-GGUF:BF16
```

Unsloth Studio

How to use SC117/Nex-N2-mini-template-fix-APEX-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for SC117/Nex-N2-mini-template-fix-APEX-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for SC117/Nex-N2-mini-template-fix-APEX-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for SC117/Nex-N2-mini-template-fix-APEX-GGUF to start chatting

How to use SC117/Nex-N2-mini-template-fix-APEX-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf SC117/Nex-N2-mini-template-fix-APEX-GGUF:BF16

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "SC117/Nex-N2-mini-template-fix-APEX-GGUF:BF16"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use SC117/Nex-N2-mini-template-fix-APEX-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf SC117/Nex-N2-mini-template-fix-APEX-GGUF:BF16

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default SC117/Nex-N2-mini-template-fix-APEX-GGUF:BF16

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use SC117/Nex-N2-mini-template-fix-APEX-GGUF with Docker Model Runner:
```
docker model run hf.co/SC117/Nex-N2-mini-template-fix-APEX-GGUF:BF16
```

Lemonade

How to use SC117/Nex-N2-mini-template-fix-APEX-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull SC117/Nex-N2-mini-template-fix-APEX-GGUF:BF16

Run and chat with the model

lemonade run user.Nex-N2-mini-template-fix-APEX-GGUF-BF16

List all available models

lemonade list

APEX Vision Agentic ⚠ Template Fix

Nex-N2-mini

📖 中文文档

Agentic Vision MoE — APEX Quantized GGUF (Stock llama.cpp)

⚠️ Temporary Workaround — Not Official

👉 Want the unmodified, recommended version? Go to SC117/Nex-N2-mini-APEX-GGUF (use with Nex's patched llama.cpp)

This is a temporary, unofficial workaround for stock llama.cpp users. The original chat_template.jinja has been replaced with a fixed version so that --reasoning-format works without --chat-template-file.

⚠️ The Nex team explicitly recommends against modifying the chat template. The model was trained strictly on the original template — deviating from the training-time format may degrade output quality. See discussion #3 for details.

The recommended approach is to use Nex's patched llama.cpp with the unmodified GGUF. Once Nex's upstream patch is merged into stock llama.cpp, these template-fixed GGUFs will be superseded.

Use this only if you cannot use Nex's patched llama.cpp and need thinking mode to work on stock builds. Be aware that output quality may differ from the original model.

💡 What is APEX?

These GGUF files are quantized using APEX, a MoE-aware mixed-precision quantization technique that outperforms standard quantization methods while being significantly smaller.

APEX beats Q8_0 perplexity at half the size — and even beats F16.

APEX classifies every tensor by its role — routed expert, shared expert, or attention — and applies a layer-wise precision gradient, giving the most sensitive edge layers higher precision and compressing the redundant middle layers more aggressively.

📦 Available Files

File	Size	BPW	Note
`Nex-N2-mini.BF16.gguf`	64.6 GB	16.0	Full precision reference
`Nex-N2-mini-APEX-I-Quality.gguf`	21.3 GB	5.23	Highest quality, best accuracy
`Nex-N2-mini-APEX-I-Balanced.gguf`	23.6 GB	5.85	Best all-rounder, recommended
`Nex-N2-mini-APEX-I-Compact.gguf`	15.4 GB	3.81	Best quality/size ratio, 16GB VRAM
`mmproj-Nex-N2-mini.F16.gguf`	858 MB	-	Vision projector (required for image/video)
`original-chat-template.jinja`	7.9 KB	-	Original unmodified template — for reference / use with Nex's patched llama.cpp

⚠ All GGUF files above (except BF16 and mmproj) contain a modified chat_template.jinja. See warning above.

🧠 Model Details

Architecture	Qwen3.5 MoE (GatedDeltaNet + Full Attention) + Vision Encoder
Parameters	35B total, 3B active per token
Experts	256 routed experts, 8 active per token
Layers	40 layers (30 linear_attn + 10 full_attn)
Context	262,144 tokens
Vision	Image + Video support (mmproj 858MB)
Thinking	Qwen3-style think tags — works on stock llama.cpp via modified template

🚀 Usage (Stock llama.cpp)

Text only

./llama-server \ -m Nex-N2-mini-APEX-I-Quality.gguf \ -ngl 99 -ncmoe 19 -c 32768 \ --host 0.0.0.0 --port 8081

With vision

./llama-server \ -m Nex-N2-mini-APEX-I-Quality.gguf \ --mmproj mmproj-Nex-N2-mini.F16.gguf \ -ngl 99 -ncmoe 19 -c 32768 \ --host 0.0.0.0 --port 8081

No --chat-template-file needed — the fixed template is embedded in the GGUF. Thinking mode works out of the box. Add --mmproj mmproj-Nex-N2-mini.F16.gguf for vision. Replace Nex-N2-mini-APEX-I-Quality.gguf with your preferred quantization tier (I-Quality / I-Balanced / I-Compact). Recommended sampling: temperature 0.7, top_p 0.95, top_k 40, min_p 0.

📋 Original Model Benchmarks

Benchmark	Score	Category
BrowseComp	74.1	Agent
SWE-Bench Verified	74.4	Coding
Terminal-Bench 2.1	60.7	Coding
GPQA Diamond	82.6	Reasoning
IFEval	89.1	Instruction

From the original Nex-N2-mini model card (BF16, full precision).

Model tree for SC117/Nex-N2-mini-template-fix-APEX-GGUF

Base model

nex-agi/Nex-N2-mini

Quantized

(49)

this model

SC117
/

Nex-N2-mini-template-fix-APEX-GGUF

Nex-N2-mini

Links

Model tree for SC117/Nex-N2-mini-template-fix-APEX-GGUF