Qwable NVFP4 serving lane

Qwable-5-27B-Coder-NVFP4

Qwable-5-27B-Coder-NVFP4 is the compact NVIDIA ModelOpt release of Qwable: a Qwen3.6-based coder-agent tune trained first on Claude Fable 5 traces, then continued on Kimi 2.7 Coder traces.

This is the serving-oriented checkpoint: NVFP4 safetensors for runtimes that understand ModelOpt quantization. Use the GGUF repo for llama.cpp and Ollama.

Serving shape

Item	Value
Format	ModelOpt NVFP4 safetensors
Producer	`nvidia-modelopt` 0.44.0
Quant algorithm	NVFP4
Group size	16
Weight files	2 safetensors shards
Approx. size	19.7 GB total
Runtime target	vLLM / TensorRT-LLM / NVIDIA ModelOpt-compatible serving
Source checkpoint	`DJLougen/Qwable-5-27B-Coder`

BF16 source checkpoint
  -> NVIDIA ModelOpt export
      -> NVFP4 Linear targets, group size 16
      -> lm_head / SSM conv1d / visual modules excluded
      -> compact safetensors serving repo

What carries over from Qwable

The NVFP4 checkpoint is a quantized release of the same coder-agent model. The target workload remains repository navigation, patch planning, terminal-output analysis, verifier recovery, tool-shaped answers, and long-context coding prompts.

Early maintainer runs show the source Qwable checkpoint outperforming the base model on a private coder benchmark. Public benchmark details are pending, so treat that as early maintainer signal rather than a reproducible leaderboard claim.

Quantization details

From hf_quant_config.json:

Field	Value
Producer	`modelopt`
Producer version	`0.44.0`
Quant algorithm	`NVFP4`
Group size	`16`
KV cache quantization	Not set in this checkpoint
Higher-precision exclusions	`lm_head`, SSM `conv1d` modules, and `model.visual*`

The config records 4-bit floating-point weights and input activations for Linear targets. The excluded modules are intentional: they preserve sensitive output, SSM convolution, and vision-tower components outside the main NVFP4 target set.

Quickstart

Install a runtime with support for ModelOpt NVFP4 checkpoints. Exact package versions change quickly; prefer the current vLLM or TensorRT-LLM documentation for your CUDA stack.

Download locally:

hf download DJLougen/Qwable-5-27B-Coder-NVFP4 --local-dir Qwable-5-27B-Coder-NVFP4

Example vLLM-style serving command, adjusted for your installed version and GPU topology:

vllm serve DJLougen/Qwable-5-27B-Coder-NVFP4 \
  --tensor-parallel-size 1 \
  --max-model-len 32768

If your runtime does not recognize the ModelOpt quantization config, use the BF16 source checkpoint or the GGUF release instead.

Recommended use

Use this repo for NVIDIA-serving experiments where NVFP4 is supported directly.
Use the BF16 repo for conversion, further training, or quality-ceiling evaluation.
Use the GGUF repo for llama.cpp and Ollama workflows.
Keep benchmark comparisons identical across model variants: same prompts, context, sampling, max tokens, and tool schema exposure.

Related releases

Source BF16 Transformers checkpoint: DJLougen/Qwable-5-27B-Coder
llama.cpp GGUF release: DJLougen/Qwable-5-27B-Coder-GGUF

Limitations

Public benchmark tables are pending.
NVFP4 runtime compatibility depends on serving stack, CUDA version, GPU architecture, and package versions.
This is not a GGUF repo; llama.cpp users should use the GGUF release.
Quantization can change instruction following, code precision, and tool-call reliability. Validate on your own tasks.
Vision components are excluded from the main NVFP4 target set, but this release is marketed for coding behavior, not vision improvement.
Safety behavior is inherited from the base model and fine-tuning data; no separate safety alignment claim is made here.

License

Released under Apache-2.0, following the upstream base model license metadata.

Downloads last month: 51

Safetensors

Model size

15B params

Tensor type

BF16

F8_E4M3

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for DJLougen/Qwable-5-27B-Coder-NVFP4

Base model

Qwen/Qwen3.6-27B

Finetuned

unsloth/Qwen3.6-27B

Finetuned

DJLougen/Qwable-5-27B-Coder

Quantized

(5)

this model