NeuralNet is a pioneering AI solutions provider that empowers businesses to harness the power of artificial intelligence.


🌟 Qwen3.6-27B Uncensored NVFP4 Quantization by NeuralNet 🧠🤖

This is an NVFP4-quantized version of NeuralNet-Hub/Qwen3.6-27B-Uncensored, produced through a combination of fine-tuning and abliteration over Qwen/Qwen3.6-27B. It is optimized for deployment on NVIDIA Blackwell architecture GPUs using vLLM.

NVFP4 quantization requires NVIDIA Blackwell architecture (GB200, RTX 5000 series, etc.). This format is not compatible with Ampere, Ada Lovelace, or Hopper GPUs. If you are running on an older GPU, please use a different quantization format.


🔓 No Filters. No Limits. Just Answers.

This model powers UncensoredGPT

Ask anything. Get real answers. No restrictions.

Join the Waitlist

Most AI models are trained to refuse. They hedge, they deflect, they lecture. UncensoredGPT is built on the opposite philosophy: that access to information should be unrestricted, and that adults are capable of deciding what they need to know.

This model is the engine behind UncensoredGPT, a platform providing unfiltered, honest responses for cybersecurity, education, content creation, research, or straightforward conversation. The refusals and content restrictions present in the original Qwen3.6-27B have been removed through a combination of supervised fine-tuning and abliteration, resulting in a model that responds directly across topics that standard models typically refuse.

Why stay in the system when you can have unrestricted answers, privacy by default, and complete freedom of information?

Ready to experience the freedom of unrestricted AI? Join the waitlist at uncensoredgpt.ai — limited spots available.


Quantization Details

This model was quantized to NVFP4 (4-bit NVIDIA Floating Point) using vLLM's built-in quantization pipeline. NVFP4 leverages native FP4 Tensor Core support introduced in Blackwell GPUs, delivering significant memory savings and throughput improvements with minimal quality degradation compared to BF16.

vllm quantize \
  --model NeuralNet-Hub/Qwen3.6-27B-Uncensored \
  --quantization nvfp4 \
  --output-dir NeuralNet-Hub/Qwen3.6-27B-Uncensored-NVFP4

⚡ Deployment with vLLM

This quantized model is intended to be served using vLLM (vllm>=0.9.0 recommended).

Quick Start

vllm serve NeuralNet-Hub/Qwen3.6-27B-Uncensored-NVFP4 \
  --quantization nvfp4 \
  --dtype bfloat16 \
  --kv-cache-dtype fp8 \
  --max-model-len 262144 \
  --reasoning-parser qwen3 \
  --enable-auto-tool-choice \
  --tool-call-parser qwen3_coder

Using a Config File

# Deploy with: vllm serve --config config.yaml
# Optimized for NVIDIA RTX 6000 PRO (Blackwell)
# Benchmarked: ~85-90 parallel requests, up to 1000 tok/sec at higher context lengths

model: NeuralNet-Hub/Qwen3.6-27B-Uncensored-NVFP4
dtype: bfloat16
kv-cache-dtype: fp8
gpu-memory-utilization: 0.95
max-model-len: 262144
max-num-batched-tokens: 4096
max-num-seqs: 200
max-cudagraph-capture-size: 209
enable-prefix-caching: true
trust-remote-code: true

reasoning-parser: qwen3
enable-auto-tool-choice: true
tool-call-parser: qwen3_coder

default-chat-template-kwargs: '{"enable_thinking": false}'

download-dir: /workspace/models
host: 0.0.0.0
port: 18000
vllm serve --config config.yaml

💬 Chat API Usage

Qwen3.6 uses a standard chat template compatible with OpenAI-format APIs. Thinking mode is enabled by default.

Thinking Mode (Default)

from openai import OpenAI

client = OpenAI(base_url="http://localhost:18000/v1", api_key="EMPTY")

messages = [{"role": "user", "content": "Your message here"}]

response = client.chat.completions.create(
    model="NeuralNet-Hub/Qwen3.6-27B-Uncensored-NVFP4",
    messages=messages,
    max_tokens=32768,
    temperature=1.0,
    top_p=0.95,
    extra_body={"top_k": 20},
)
print(response.choices[0].message.content)

Non-Thinking (Instruct) Mode

response = client.chat.completions.create(
    model="NeuralNet-Hub/Qwen3.6-27B-Uncensored-NVFP4",
    messages=messages,
    max_tokens=8192,
    temperature=0.7,
    top_p=0.8,
    presence_penalty=1.5,
    extra_body={
        "top_k": 20,
        "chat_template_kwargs": {"enable_thinking": False},
    },
)

Image Input

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}},
            {"type": "text", "text": "Describe this image in detail."}
        ]
    }
]

response = client.chat.completions.create(
    model="NeuralNet-Hub/Qwen3.6-27B-Uncensored-NVFP4",
    messages=messages,
    max_tokens=32768,
    temperature=1.0,
    top_p=0.95,
    extra_body={"top_k": 20},
)

⚙️ Recommended Sampling Parameters

Mode temperature top_p top_k presence_penalty
Thinking — general tasks 1.0 0.95 20 0.0
Thinking — precise coding 0.6 0.95 20 0.0
Instruct (non-thinking) 0.7 0.80 20 1.5

📥 Download with huggingface-cli

Install the CLI

pip install -U "huggingface_hub[cli]"

Download the Full Repository

huggingface-cli download NeuralNet-Hub/Qwen3.6-27B-Uncensored-NVFP4 --local-dir ./Qwen3.6-27B-Uncensored-NVFP4

Download Specific Files

huggingface-cli download NeuralNet-Hub/Qwen3.6-27B-Uncensored-NVFP4 \
  --include "*.safetensors" \
  --local-dir ./Qwen3.6-27B-Uncensored-NVFP4

🔧 Hardware Requirements

Component Requirement
GPU Architecture NVIDIA Blackwell (sm_100+)
VRAM 24 GB+ recommended
CUDA 12.8+
vLLM 0.9.0+

NVFP4 is exclusively supported on NVIDIA Blackwell GPUs. Attempting to run this model on Ampere (A100), Ada Lovelace (RTX 4000), or Hopper (H100) will fail. For those architectures, use the original BF16 model or an AWQ/GPTQ quantized variant.


🌐 Contact Us

NeuralNet is a pioneering AI solutions provider that empowers businesses to harness the power of artificial intelligence.

Website: https://neuralnet.solutions Email: info[at]neuralnet.solutions

Downloads last month
175
Safetensors
Model size
19B params
Tensor type
F32
·
BF16
·
F8_E4M3
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for NeuralNet-Hub/Qwen3.6-27B-Uncensored-NVFP4

Base model

Qwen/Qwen3.6-27B
Quantized
(435)
this model