Qwen3.6-27B-NVFP4

This repository provides a ModelOpt-exported NVFP4 checkpoint derived from Qwen/Qwen3.6-27B.

What this repo contains

This repo includes:

  • ModelOpt NVFP4 quantized weights
  • hf_quant_config.json
  • config.json
  • tokenizer files
  • chat template files

The checkpoint is intended for deployment with runtimes that support ModelOpt-style FP4 / NVFP4 model loading.

Quantization summary

  • Base model: Qwen/Qwen3.6-27B
  • Quantization format: NVFP4
  • KV cache: FP8
  • Export style: Unified Hugging Face checkpoint
  • Primary target runtime: vLLM / compatible ModelOpt FP4 loaders

Conversion notes

This checkpoint was produced after testing multiple export strategies.

A direct default NVFP4 export path produced structurally inconsistent checkpoints for this model family.
The working export used a more conservative recipe closer to MLP-only NVFP4 with FP8 KV cache, which preserved deployment correctness and passed basic reasoning / tool-calling / smoke evaluations.

Validation summary

Basic validation completed successfully for:

  • short-answer generation
  • reasoning on/off behavior
  • streaming / non-streaming behavior
  • tool-call parsing
  • JSON smoke tests
  • basic code-generation smoke tests

Observed caveat:

  • Some long reasoning prompts may consume a large reasoning budget before producing the final answer, so higher output-token limits may be needed for long-chain reasoning tasks.

Example deployment (vLLM)

docker run -d \
  --name qwen36-nvfp4 \
  --gpus '"device=0"' \
  --ipc=host \
  --network=host \
  --restart unless-stopped \
  -v /path/to/this/repo:/model \
  vllm/vllm-openai:cu130-nightly \
  /model \
  --served-model-name qwen3.6-27b-nvfp4 \
  --quantization modelopt_fp4 \
  --tensor-parallel-size 1 \
  --gpu-memory-utilization 0.90 \
  --max-model-len 32768 \
  --reasoning-parser qwen3 \
  --enable-auto-tool-choice \
  --tool-call-parser qwen3_coder \
  --language-model-only \
  --port 8000

Example test request:

curl -s http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3.6-27b-nvfp4",
    "messages": [{"role":"user","content":"Hello. Please answer only: test successful."}],
    "chat_template_kwargs": {"enable_thinking": false},
    "max_tokens": 64,
    "temperature": 0.0
  }'

Acknowledgements

Base model: Qwen/Qwen3.6-27B Quantization / export workflow: NVIDIA Model Optimizer ecosystem

Downloads last month
6
Safetensors
Model size
19B params
Tensor type
BF16
·
F8_E4M3
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for lkk688/Qwen3.6-27B-NVFP4

Base model

Qwen/Qwen3.6-27B
Finetuned
(241)
this model