Instructions to use nex-agi/Nex-N2-mini with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use nex-agi/Nex-N2-mini with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="nex-agi/Nex-N2-mini")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("nex-agi/Nex-N2-mini")
model = AutoModelForImageTextToText.from_pretrained("nex-agi/Nex-N2-mini")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use nex-agi/Nex-N2-mini with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "nex-agi/Nex-N2-mini"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nex-agi/Nex-N2-mini",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/nex-agi/Nex-N2-mini

SGLang

How to use nex-agi/Nex-N2-mini with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "nex-agi/Nex-N2-mini" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nex-agi/Nex-N2-mini",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "nex-agi/Nex-N2-mini" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nex-agi/Nex-N2-mini",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use nex-agi/Nex-N2-mini with Docker Model Runner:
```
docker model run hf.co/nex-agi/Nex-N2-mini
```

MTP Support?

by volodXYZ - opened 2 days ago

Discussion

volodXYZ

2 days ago

I've noticed that the config says 1x MTP hidden layer, but the weights do not have separately named mtp.* tensors like Qwen A3B-35B had. Will you be training MTP to work with your model as well in the future? Thanks.

config.json:

"mtp_num_hidden_layers": 1,
"mtp_use_dedicated_embeddings": false

Meteonis

Nex AGI org 2 days ago

Hi @volodXYZ , thanks for flagging this! The MTP tensors aren't part of the current weights yet — we're still validating the stability of Nex-N2 with MTP speculative decoding. Once that testing wraps up, we'll update the weights to include the trained MTP layer. Stay tuned!

volodXYZ

2 days ago

Thank you for the quick reply.
You should also consider offering quantized .gguf files (Q8, Q4, etc.) and fixing the chat template for llama.cpp, if you want your model to get more adoption with local users.
Without modifications, thinking does not work with the default Qwen 3.6 chat template, and neither does their "preserve thinking" parameter.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment