Instructions to use edougawa/Nex-N2-mini-Abliterated with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use edougawa/Nex-N2-mini-Abliterated with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="edougawa/Nex-N2-mini-Abliterated")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("edougawa/Nex-N2-mini-Abliterated")
model = AutoModelForMultimodalLM.from_pretrained("edougawa/Nex-N2-mini-Abliterated")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use edougawa/Nex-N2-mini-Abliterated with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "edougawa/Nex-N2-mini-Abliterated"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "edougawa/Nex-N2-mini-Abliterated",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/edougawa/Nex-N2-mini-Abliterated

SGLang

How to use edougawa/Nex-N2-mini-Abliterated with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "edougawa/Nex-N2-mini-Abliterated" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "edougawa/Nex-N2-mini-Abliterated",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "edougawa/Nex-N2-mini-Abliterated" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "edougawa/Nex-N2-mini-Abliterated",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use edougawa/Nex-N2-mini-Abliterated with Docker Model Runner:
```
docker model run hf.co/edougawa/Nex-N2-mini-Abliterated
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Nex-N2-mini-Abliterated

This is a decensored (abliterated) version of nex-agi/Nex-N2-mini, produced with Abliterix v1.8.0.

Abliteration orthogonalizes the model weights against the measured "refusal" direction, reducing refusals while keeping the base model's capabilities as intact as possible (low KL divergence). It does not add any new knowledge or capability — all credit for the underlying model belongs to Nex-AGI. The original model card is reproduced in full below.

Base model: nex-agi/Nex-N2-mini (Apache-2.0, by Nex-AGI)

⚠️ Safety disclaimer

This model has had its built-in refusal behavior deliberately reduced. As a result it may produce unexpected, offensive, inaccurate, or otherwise harmful output, and may comply with requests that the original model would have refused.

It is provided by the publisher, edougawa, "as is" and without warranty of any kind, express or implied. Use at your own risk.
You are solely responsible for how you use this model and for ensuring your use — and any generated output — complies with all applicable laws, regulations, and the terms of the base model's license.
To the maximum extent permitted by law, the publisher (edougawa), the base-model authors (Nex-AGI), and the Abliterix authors accept no liability for any claim, damages, or other consequences arising from the use of this model or its outputs.
Outputs do not reflect the views of the publisher (edougawa), the base-model authors (Nex-AGI), or the Abliterix authors. Apply your own safety filtering, human review, and guardrails before any production or user-facing use.

Steering parameters

Parameter	Value
vector_index	29.50
attn.k_proj.max_weight	3.43
attn.k_proj.max_weight_position	28.18
attn.k_proj.min_weight	1.88
attn.k_proj.min_weight_distance	11.27
attn.o_proj.max_weight	2.41
attn.o_proj.max_weight_position	37.58
attn.o_proj.min_weight	0.30
attn.o_proj.min_weight_distance	20.37
attn.q_proj.max_weight	2.29
attn.q_proj.max_weight_position	35.88
attn.q_proj.min_weight	0.52
attn.q_proj.min_weight_distance	9.47
attn.v_proj.max_weight	0.59
attn.v_proj.max_weight_position	29.14
attn.v_proj.min_weight	0.22
attn.v_proj.min_weight_distance	10.12
mlp.down_proj.max_weight	3.84
mlp.down_proj.max_weight_position	30.09
mlp.down_proj.min_weight	2.22
mlp.down_proj.min_weight_distance	13.71

Performance

Metric	This model (Nex-N2-mini-Abliterated)	Base model (Nex-N2-mini)
KL divergence	0.0180	0 (by definition)
Refusals	17/200	197/200

Deployment, sampling, function-calling, and reasoning-parser instructions from the base model (see the original card below) apply unchanged to this abliterated checkpoint.

Quickstart

Download this model (edougawa/Nex-N2-mini-Abliterated) and serve it with the Nex-AGI sglang fork:

# 1. Download edougawa/Nex-N2-mini-Abliterated
hf download edougawa/Nex-N2-mini-Abliterated --local-dir ./Nex-N2-mini-Abliterated

# 2. Serve it (single node, 2× H100 — see the original card for multi-node / Docker)
python -m sglang.launch_server \
  --model-path ./Nex-N2-mini-Abliterated \
  --tp 2 \
  --reasoning-parser qwen3 \
  --tool-call-parser qwen3_coder \
  --mamba-scheduler-strategy extra_buffer

# Or load it directly with transformers
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "edougawa/Nex-N2-mini-Abliterated"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, dtype="bfloat16", device_map="auto")

Original model card

The following is the original model card for the base model, nex-agi/Nex-N2-mini, reproduced here in full with credit to Nex-AGI. The serving instructions below apply to this abliterated checkpoint as well.

🤗 Model | 🔀 OpenRouter (Enjoy two weeks free starting June 9!) | 💻 Github | 🧭 ModelScope | 🚀 Nex-AGI

Nex-N2

An agentic model with Agentic Thinking.

Today, we are officially releasing and open-sourcing our next-generation model, Nex-N2 — an agent model built for real-world productivity scenarios. With first-tier coding and agentic capabilities, Nex-N2 keeps driving complex, long-horizon tasks forward in real environments to deliver stable, end-to-end results.

Over the past year, a paradigm shift led by Vibe Coding and Harness Engineering has been redefining the limits of LLM agents. From dialogue, to reasoning, to agents that execute long-horizon tasks with environmental feedback, the tasks models must handle keep growing harder, the contexts longer, and the environments more realistic. The core of next-generation model competition is no longer whether a model can think, but whether it can reliably and efficiently turn thinking into actions that are executable, verifiable, and iterable.

Rather than treating reasoning, tool use, and environment execution as separate capabilities, Nex-N2 unifies them through an Agentic Thinking framework that connects requirement understanding, task planning, code implementation, environmental feedback, evaluation and debugging, and continuous iteration into a single closed loop. The framework has two parts:

Adaptive Thinking lets the model decide on its own when to think and how deeply — executing simple actions quickly while reasoning thoroughly on critical decisions.
Coherent Thinking carries one consistent reasoning paradigm across general reasoning and diverse agentic tasks, staying consistent across tasks and modalities to enable stable capability transfer.

Across real agentic workflows — agentic coding, deep research, tool calling, and terminal execution — Nex-N2 reaches first-tier performance, with substantial gains over the previous-generation Nex-N1 on multiple authoritative benchmarks. In real productivity scenarios such as OpenClaw one-person-company workflows, end-to-end game development, and web and multimodal generation, it likewise demonstrates outstanding usability, robustness, and stability.

Open Source

In keeping with our commitment to open source, we are releasing both Nex-N2-Pro and Nex-N2-mini as open-source models starting today.

Nex-N2-Pro: Hugging Face | ModelScope
Nex-N2-mini: Hugging Face | ModelScope
Early Access: SiliconFlow

We welcome developers and enterprises to integrate and try Nex-N2 and share their feedback.

Performance

We evaluate Nex-N2 in real agentic workflows along three directions — agentic tasks, coding tasks, and general tasks — covering benchmarks across tool calling, search-based decision-making, software engineering, and terminal execution. Nex-N2-Pro delivers strong performance that keeps pace with top-tier models such as GPT-5.5 and Opus 4.7: it excels at coding (e.g., 75.3 on Terminal-Bench 2.1) and long-horizon tasks (1585 on GDPval), and shows especially strong generalization and competitiveness on newer benchmarks like SWE-Atlas and DeepSWE. On general capability and core reasoning, it stands on par with leading frontier models.

Nex-N2 ships in two variants, both post-trained on the Qwen3.5 series: Nex-N2-Pro (built on Qwen3.5-397B-A17B) and Nex-N2-mini (built on Qwen3.5-35B-A3B-Base), covering different latency and quality trade-offs. The table below reports their scores alongside leading proprietary and open models across our full evaluation suite.

Benchmark	Nex-N2-mini	Nex-N2-Pro	GPT-5.5	Opus 4.7	Kimi-K2.6	GLM-5.1	MiniMax M3	DeepSeek-V4-Pro
Agent
BrowseComp	74.1	83.7	84.4	79.8	83.2	79.3	83.5	83.4
GDPval	1402	1585	1769	1753	1481	1535	-	1554
Toolathlon	33.3	51.9	55.6	52.8	50.0	40.7	-	51.8
WildClawBench	47.7	53.5	58.2	62.2	-	48.2	-	43.7
WideSearch	62.0	75.6	-	-	80.8	-	-	-
TAU3	65.9	71.1	-	-	-	70.6	-	-
Coding & SWE
SWE-Bench Pro	50.2	58.8	58.6	64.3	58.6	58.4	59.0	55.4
Terminal-Bench 2.1	60.7	75.3	83.4	69.7	-	58.7	66.0	72.0
DeepSWE	8.0	33.6	70	54	24	18	-	8
SWE-Bench Verified	74.4	80.8	82.9	87.6	80.2	-	80.5	80.6
SWE Atlas QnA	31.5	37.9	45.4	45.2	-	-	37.9	-
SWE Atlas RF	30.0	32.9	44.8	48.6	-	-	-	-
SWE Atlas TW	23.3	40.0	42.6	38.2	-	-	30.8	-
General & Reasoning
GPQA Diamond	82.6	90.7	93.6	94.2	90.5	86.2	-	90.1
IFEval	89.1	94.0	-	-	94.5	94.5	-	91.9
Apex	9.4	36.5	-	-	24.0	11.5	-	38.3

Usage

Local Deployment

Note: For the best performance with Nex-series models, we recommend serving them with our customized sglang fork.

First, install our sglang fork:

# Use the customized `sglang` fork
git clone https://github.com/nex-agi/sglang.git
cd sglang

# Install the python packages
pip install --upgrade pip
pip install -e "python"

Nex-N2-Pro

Launch the server (example on two 8× H100 servers with CUDA 13.0):

# Multi-node (2 nodes). Run the same command on every node with:
#   <node-rank> = 0 on the head node, 1 on the other node
#   <node0-ip>  = IP of the head node (reachable from all others)
python -m sglang.launch_server \
  --model-path /path/to/your/model  \
  --tp 16 \
  --nnodes 2 \
  --node-rank <node-rank> \
  --dist-init-addr <node0-ip>:20000 \
  --reasoning-parser qwen3 \
  --tool-call-parser qwen3_coder \
  --mamba-scheduler-strategy extra_buffer

Nex-N2-mini

Launch the server (example on one 2× H100 server with CUDA 13.0):

python -m sglang.launch_server \
  --model-path /path/to/your/model  \
  --tp 2 \
  --reasoning-parser qwen3 \
  --tool-call-parser qwen3_coder \
  --mamba-scheduler-strategy extra_buffer

Docker Deployment

We also provide a prebuilt Docker image with our customized sglang fork preinstalled: nexagi/sglang:v0.5.12. The launch command is the same as above.

Nex-N2-Pro

# Multi-node (2 nodes). Run the same command on every node with:
#   <node-rank> = 0 on the head node, 1 on the other node
#   <node0-ip>  = IP of the head node (reachable from all others)
docker run --gpus all --shm-size 32g --network host \
  -v /path/to/your/model:/model \
  nexagi/sglang:v0.5.12 \
  python3 -m sglang.launch_server \
    --model-path /model \
    --tp 16 \
    --nnodes 2 \
    --node-rank <node-rank> \
    --dist-init-addr <node0-ip>:20000 \
    --host 0.0.0.0 --port 30000 \
    --reasoning-parser qwen3 \
    --tool-call-parser qwen3_coder \
    --mamba-scheduler-strategy extra_buffer

Nex-N2-mini

Single node with 2× H100:

docker run --gpus all --shm-size 32g --ipc=host \
  -p 30000:30000 \
  -v /path/to/your/model:/model \
  nexagi/sglang:v0.5.12 \
  python3 -m sglang.launch_server \
    --model-path /model \
    --tp 2 \
    --host 0.0.0.0 --port 30000 \
    --reasoning-parser qwen3 \
    --tool-call-parser qwen3_coder \
    --mamba-scheduler-strategy extra_buffer

Recommended Sampling Parameters

For the best generation quality, we recommend the following sampling parameters:

temperature: 0.7
top_p: 0.95
top_k: 40

Function Calling

Nex-series models support robust function-calling capabilities. To enable function calling, add the --tool-call-parser qwen3_coder flag when launching the server:

python -m sglang.launch_server --model-path /path/to/your/model --tool-call-parser qwen3_coder

Reasoning Parser

Nex-series models emit explicit reasoning traces. Add the --reasoning-parser qwen3 flag to parse the reasoning content separately from the final response. It can be combined with the function-calling parser above:

python -m sglang.launch_server --model-path /path/to/your/model --tool-call-parser qwen3_coder --reasoning-parser qwen3

Downloads last month: 31

Safetensors

Model size

35B params

Tensor type

BF16

Model tree for edougawa/Nex-N2-mini-Abliterated

Base model

nex-agi/Nex-N2-mini

Finetuned

(7)

this model

Finetunes

1 model

Quantizations

1 model