Instructions to use maci0/Qwopus3.6-27B-Coder-abliterated-NVFP4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use maci0/Qwopus3.6-27B-Coder-abliterated-NVFP4 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="maci0/Qwopus3.6-27B-Coder-abliterated-NVFP4")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("maci0/Qwopus3.6-27B-Coder-abliterated-NVFP4")
model = AutoModelForMultimodalLM.from_pretrained("maci0/Qwopus3.6-27B-Coder-abliterated-NVFP4")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use maci0/Qwopus3.6-27B-Coder-abliterated-NVFP4 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "maci0/Qwopus3.6-27B-Coder-abliterated-NVFP4"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "maci0/Qwopus3.6-27B-Coder-abliterated-NVFP4",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/maci0/Qwopus3.6-27B-Coder-abliterated-NVFP4

SGLang

How to use maci0/Qwopus3.6-27B-Coder-abliterated-NVFP4 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "maci0/Qwopus3.6-27B-Coder-abliterated-NVFP4" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "maci0/Qwopus3.6-27B-Coder-abliterated-NVFP4",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "maci0/Qwopus3.6-27B-Coder-abliterated-NVFP4" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "maci0/Qwopus3.6-27B-Coder-abliterated-NVFP4",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use maci0/Qwopus3.6-27B-Coder-abliterated-NVFP4 with Docker Model Runner:
```
docker model run hf.co/maci0/Qwopus3.6-27B-Coder-abliterated-NVFP4
```

◆ Rogue Quants · NVFP4

🪐 Qwopus3.6-27B-Coder abliterated · NVFP4

🔓 Abliterated

27B coder vision-language · abliterated (2-round Heretic) · agentic + tool-calling · thinking · GPTQ NVFP4 W4A4

⚙️ NVFP4 · W4A4 💾 ~18 GB 📉 PPL 6.73 📐 256K context 🚀 vLLM · Blackwell 🔓 Abliterated 💻 Coder 🚫 Refusals 3/100

Refusals 99 → 3 2 rounds (of 100 prompts)

KL divergence 0.0398 <0.5 = capability kept

Size on disk 18 GB vs 55.6 GB bf16 (~33%)

Context 256K 262144 tokens

TL;DR: Qwopus3.6-27B-Coder abliterated, quantized to NVFP4 (W4A4) for vLLM on NVIDIA Blackwell. 18 GB, wikitext-2 PPL 6.73, 256K agentic coder, refusals removed (2-round Heretic).

Qwopus3.6-27B-Coder abliterated NVFP4

Jackrong/Qwopus3.6-27B-Coder, abliterated (refusal direction removed) with Heretic in two iterative rounds, then quantized to NVFP4 (W4A4) in the compressed-tensors nvfp4-pack-quantized format with llm-compressor (GPTQ + MSE, shared fused-layer scales).

Near-lossless and decensored. Two Heretic rounds cut refusals from 99/100 to 3/100 of held-out harmful prompts while keeping a KL divergence of 0.0398 to the original model (well under the 0.5 line that signals capability damage). NVFP4 then compresses to ~18 GB with a wikitext-2 perplexity of 6.73.

Built for vLLM on NVIDIA Blackwell (4-bit weight + 4-bit activation). Pre-Blackwell GPUs run it weight-only.
Loading and generation verified in vLLM v0.23.0 on an NVIDIA GB10 (Blackwell, sm_121).

Uncensored / abliterated model. It follows instructions without refusal guardrails. The abliteration only removes refusals; all other behaviour comes from the base model. You are responsible for how you use it.

Fidelity

Near-lossless versus the bf16 source: wikitext-2 perplexity for this build is 6.73.

Metric	Value
wikitext-2 PPL	6.73
Weights	NVFP4 W4A4, group 16
Size	18 GB vs 55.6 GB bf16 (~33%)
KL divergence	0.0398 (capability preservation, lower is better)

NVFP4 uses GPTQ error compensation, an MSE observer, and shared fused-layer scales, so the drop from bf16 is minimal.

Quickstart

NVFP4 is auto-detected from config.json (compressed-tensors); no quantization flag needed. --reasoning-parser qwen3 splits the <think> block into reasoning_content; --tool-call-parser qwen3_coder enables tool/function calling for agentic coding.

vllm serve maci0/Qwopus3.6-27B-Coder-abliterated-NVFP4 \
  --served-model-name qwopus-27b-coder-abliterated-nvfp4 \
  --max-model-len 131072 \
  --gpu-memory-utilization 0.90 \
  --kv-cache-dtype fp8 \
  --reasoning-parser qwen3 \
  --enable-auto-tool-choice --tool-call-parser qwen3_coder

Supports up to 262144 tokens; keep at least 128K to preserve thinking quality. --max-model-len 131072 is a safe default; raise it if memory allows.
Add --language-model-only to skip the vision tower and free KV cache for text use.
The parser flags are not auto-detected; pass them explicitly.

Python (OpenAI client)

from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="x")
r = client.chat.completions.create(
    model="qwopus-27b-coder-abliterated-nvfp4",
    messages=[{"role": "user", "content": "Write a Python function that merges two sorted lists."}],
)
print(r.choices[0].message.content)

curl

curl http://localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d '{
  "model": "qwopus-27b-coder-abliterated-nvfp4",
  "messages": [{"role": "user", "content": "Write a Python function that merges two sorted lists."}]
}'

About the base model

A 27B Qwen3.5-family vision-language model specialized for code (Qwopus 3.6 Coder), with thinking-mode reasoning and a 256K context window.

64 decoder layers: hybrid gated delta-net linear attention plus full attention, dense MLP, plus a vision tower for image and video input.
256K context (max_position_embeddings 262144).
Thinking mode by default, with an instruct toggle.

Abliteration

Heretic removes the refusal direction with a TPE-optimized search over per-component ablation strength, jointly minimizing refusal rate and KL divergence from the original model, then merges the best trial. This model was abliterated in two iterative rounds: round 1 removed the dominant refusal direction, then Heretic was re-run on the round-1 model to remove the residual refusal direction that surfaced once the first was gone. Because Qwopus is a thinking model, evaluation ran in non-thinking mode so each judged response is a real answer rather than an unfinished <think> block; the shipped model restores the original thinking chat template.

Round	Refusals	Note
Baseline	99/100	original model
Round 1	25/100	dominant refusal direction removed
Round 2	3/100	residual direction removed (final)

Datasets: mlabonne/harmless_alpaca (good) vs mlabonne/harmful_behaviors (bad).
Final KL divergence 0.0398 (capability preserved, well under the 0.5 damage line).

Quantization


Scheme	NVFP4, W4A4
Weight rounding	GPTQ (Hessian-based error compensation), MSE observer
Weights	FP4 (E2M1), `group_size=16`, `tensor_group`, FP8 (E4M3) group scales, shared across fused layers
Activations	FP4, dynamic per-group, FP8 (E4M3) scales
Quantized	all language-model `Linear` layers
Kept in bf16	vision tower (`model.visual.*`), `lm_head`, MTP head
Untouched	gated delta-net `Conv1d` and SSM params (`A_log`, `dt_bias`), never `Linear`

GPTQ is a quantization-time cost only; inference speed and format are identical to plain round-to-nearest NVFP4, but it chooses better 4-bit values.

Calibration: 512 domain-matched samples (long reasoning + general chat + code), max_seq_len=2048, text-only path through the VL model.

Recommended sampling

Thinking mode is the default.

Thinking, precise coding: temperature=0.6, top_p=0.95, top_k=20
Thinking, general: temperature=1.0, top_p=0.95, top_k=20
Instruct / non-thinking: temperature=0.7, top_p=0.80, top_k=20
To run non-thinking, set {%- set enable_thinking = false %} in the chat template, or pass extra_body={"chat_template_kwargs": {"enable_thinking": false}}.

Base model: Jackrong/Qwopus3.6-27B-Coder
Space: Rogue Quants
Collection: NVFP4 Quants
Sibling NVFP4 quants:
- Qwopus3.6-27B-Coder (the censored version)
- Qwopus3.6-27B-v2 abliterated
- Qwen3.6-40B Deckard
- Huihui-Qwythos-9B Claude-Mythos
- Ornith-1.0-9B abliterated

Notes

Needs NVIDIA Blackwell (sm_121, e.g. GB10) for accelerated W4A4; pre-Blackwell GPUs run it weight-only.
--reasoning-parser and --tool-call-parser are not auto-detected; pass them explicitly.
Thinking mode is on by default; toggle it via the chat template or chat_template_kwargs.
No refusal guardrails; you are responsible for how you use it.

License

Apache-2.0, following the base model. Intended use and all responsibility for use follow the base model.

Credits

Base model: Jackrong
Abliteration: Heretic by Philipp Emanuel Weidmann
Quantization tooling: llm-compressor / compressed-tensors

Part of 🎲 Rogue Quants, a set of NVFP4 (W4A4) quants for vLLM on Blackwell. See the full NVFP4 Quants collection.
Built on NVIDIA GB10 (Blackwell, sm_121) with llm-compressor · GPTQ + MSE · shared fused-layer scales.

Downloads last month: 28

Safetensors

Model size

16B params

Tensor type

F32

BF16

F8_E4M3

Model tree for maci0/Qwopus3.6-27B-Coder-abliterated-NVFP4

Base model

Jackrong/Qwopus3.6-27B-v2

Adapter

Jackrong/Qwopus3.6-27B-Coder

Quantized

(26)

this model

Space using maci0/Qwopus3.6-27B-Coder-abliterated-NVFP4 1

Collections including maci0/Qwopus3.6-27B-Coder-abliterated-NVFP4

NVFP4 Quants: GB10 / Blackwell

Collection

NVFP4 (W4A4) quants of Qwen3.5-family VL models, GPTQ + MSE in compressed-tensors, built for vLLM on NVIDIA Blackwell (GB10). • 6 items • Updated about 14 hours ago

Abliterated / Uncensored NVFP4