Qwable-5-27B-Coder banner

Qwable NVFP4 serving lane

Qwable-5-27B-Coder-NVFP4

Qwable-5-27B-Coder-NVFP4 is the compact NVIDIA ModelOpt release of Qwable: a Qwen3.6-based coder-agent tune trained first on Claude Fable 5 traces, then continued on Kimi 2.7 Coder traces.

This is the serving-oriented checkpoint: NVFP4 safetensors for runtimes that understand ModelOpt quantization. Use the GGUF repo for llama.cpp and Ollama.

Support on Ko-fi

Serving shape

Item Value
Format ModelOpt NVFP4 safetensors
Producer nvidia-modelopt 0.44.0
Quant algorithm NVFP4
Group size 16
Weight files 2 safetensors shards
Approx. size 19.7 GB total
Runtime target vLLM / TensorRT-LLM / NVIDIA ModelOpt-compatible serving
Source checkpoint DJLougen/Qwable-5-27B-Coder
BF16 source checkpoint
  -> NVIDIA ModelOpt export
      -> NVFP4 Linear targets, group size 16
      -> lm_head / SSM conv1d / visual modules excluded
      -> compact safetensors serving repo

What carries over from Qwable

The NVFP4 checkpoint is a quantized release of the same coder-agent model. The target workload remains repository navigation, patch planning, terminal-output analysis, verifier recovery, tool-shaped answers, and long-context coding prompts.

Early maintainer runs show the source Qwable checkpoint outperforming the base model on a private coder benchmark. Public benchmark details are pending, so treat that as early maintainer signal rather than a reproducible leaderboard claim.

Quantization details

From hf_quant_config.json:

Field Value
Producer modelopt
Producer version 0.44.0
Quant algorithm NVFP4
Group size 16
KV cache quantization Not set in this checkpoint
Higher-precision exclusions lm_head, SSM conv1d modules, and model.visual*

The config records 4-bit floating-point weights and input activations for Linear targets. The excluded modules are intentional: they preserve sensitive output, SSM convolution, and vision-tower components outside the main NVFP4 target set.

Quickstart

Install a runtime with support for ModelOpt NVFP4 checkpoints. Exact package versions change quickly; prefer the current vLLM or TensorRT-LLM documentation for your CUDA stack.

Download locally:

hf download DJLougen/Qwable-5-27B-Coder-NVFP4 --local-dir Qwable-5-27B-Coder-NVFP4

Example vLLM-style serving command, adjusted for your installed version and GPU topology:

vllm serve DJLougen/Qwable-5-27B-Coder-NVFP4 \
  --tensor-parallel-size 1 \
  --max-model-len 32768

If your runtime does not recognize the ModelOpt quantization config, use the BF16 source checkpoint or the GGUF release instead.

Recommended use

  • Use this repo for NVIDIA-serving experiments where NVFP4 is supported directly.
  • Use the BF16 repo for conversion, further training, or quality-ceiling evaluation.
  • Use the GGUF repo for llama.cpp and Ollama workflows.
  • Keep benchmark comparisons identical across model variants: same prompts, context, sampling, max tokens, and tool schema exposure.

Related releases

Limitations

  • Public benchmark tables are pending.
  • NVFP4 runtime compatibility depends on serving stack, CUDA version, GPU architecture, and package versions.
  • This is not a GGUF repo; llama.cpp users should use the GGUF release.
  • Quantization can change instruction following, code precision, and tool-call reliability. Validate on your own tasks.
  • Vision components are excluded from the main NVFP4 target set, but this release is marketed for coding behavior, not vision improvement.
  • Safety behavior is inherited from the base model and fine-tuning data; no separate safety alignment claim is made here.

License

Released under Apache-2.0, following the upstream base model license metadata.

Downloads last month
51
Safetensors
Model size
15B params
Tensor type
BF16
F8_E4M3
U8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for DJLougen/Qwable-5-27B-Coder-NVFP4

Base model

Qwen/Qwen3.6-27B
Quantized
(5)
this model