Instructions to use nex-agi/Nex-N2-Pro with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use nex-agi/Nex-N2-Pro with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="nex-agi/Nex-N2-Pro")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("nex-agi/Nex-N2-Pro")
model = AutoModelForMultimodalLM.from_pretrained("nex-agi/Nex-N2-Pro")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use nex-agi/Nex-N2-Pro with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "nex-agi/Nex-N2-Pro"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nex-agi/Nex-N2-Pro",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/nex-agi/Nex-N2-Pro

SGLang

How to use nex-agi/Nex-N2-Pro with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "nex-agi/Nex-N2-Pro" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nex-agi/Nex-N2-Pro",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "nex-agi/Nex-N2-Pro" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nex-agi/Nex-N2-Pro",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use nex-agi/Nex-N2-Pro with Docker Model Runner:
```
docker model run hf.co/nex-agi/Nex-N2-Pro
```

Fix thinking bug in jinja template

by huaj1ng - opened 3 days ago

base: refs/heads/main

←

from: refs/pr/7

Discussion Files changed

-1

huaj1ng

3 days ago

Without the \n after <think>, the think content will be mixed into normal conversational text.

This is the practice aligning with jinja template usages in comparable projects, for example:

Should be a solution to https://github.com/nex-agi/Nex-N2/issues/2

Fix thinking bug in jinja template34589c26

00index

Nex AGI org 3 days ago

Hi @huaj1ng , thanks a lot for the contribution and for digging into the chat template! 🙏

To help us verify the fix, could you share a bit more detail?

A repro — the original request (rendered prompt / messages) and the raw response where the thinking content blended into the regular text.
Serving stack — were you using our recommended sglang branch or upstream sglang / another engine, and was --reasoning-parser qwen3 enabled? The rendering of can differ across stacks.
This will help us confirm the change matches the training-time format before merging. Thanks again!

00index

Nex AGI org 1 day ago

Hi @huaj1ng — thanks for the report.

After investigation, the root cause turned out to be in llama.cpp's reasoning parser, not the template.

Adding \n after does work around it, but the model was trained strictly on the current template, so deviating from it at inference time may hurt output quality. We'd rather keep the template as-is.

We've patched llama.cpp and verified the fix with the unmodified GGUF. Builds are available now:

Binaries: https://github.com/nex-agi/llama.cpp/releases/tag/nex-b9596-fix-b9599-9cd1771
Docker: docker pull ghcr.io/nex-agi/llama.cpp:server-cuda-nex-b9596-fix-b9598-8c0d5c9 (more variants at https://github.com/orgs/nex-agi/packages)

We'll submit the patch upstream to llama.cpp shortly — once merged, stock llama.cpp will work out of the box. We'll update this thread with the PR link.

00index changed pull request status to closed 1 day ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment