Instructions to use nex-agi/Nex-N2-mini with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use nex-agi/Nex-N2-mini with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="nex-agi/Nex-N2-mini") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("nex-agi/Nex-N2-mini") model = AutoModelForImageTextToText.from_pretrained("nex-agi/Nex-N2-mini") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use nex-agi/Nex-N2-mini with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "nex-agi/Nex-N2-mini" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nex-agi/Nex-N2-mini", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/nex-agi/Nex-N2-mini
- SGLang
How to use nex-agi/Nex-N2-mini with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "nex-agi/Nex-N2-mini" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nex-agi/Nex-N2-mini", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "nex-agi/Nex-N2-mini" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nex-agi/Nex-N2-mini", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use nex-agi/Nex-N2-mini with Docker Model Runner:
docker model run hf.co/nex-agi/Nex-N2-mini
MTP Support?
I've noticed that the config says 1x MTP hidden layer, but the weights do not have separately named mtp.* tensors like Qwen A3B-35B had. Will you be training MTP to work with your model as well in the future? Thanks.
config.json:
"mtp_num_hidden_layers": 1,
"mtp_use_dedicated_embeddings": false
Thank you for the quick reply.
You should also consider offering quantized .gguf files (Q8, Q4, etc.) and fixing the chat template for llama.cpp, if you want your model to get more adoption with local users.
Without modifications, thinking does not work with the default Qwen 3.6 chat template, and neither does their "preserve thinking" parameter.