Qwable-v1 NVFP4 GGUF

GGUF quantization of lordx64/Qwable-v1 โ€” an agentic coding model built by layering Claude Fable-5 tool-use behavior on top of a Claude Opus 4.7 reasoning distill of Qwen3.6-35B-A3B.

Model Details

  • Architecture: Qwen3.5 MoE, 41 blocks (40 layers + 1 MTP head), 256 experts (8 active/token)
  • Active Parameters: ~3B
  • Context: 262,144 tokens
  • Vision: Yes (27-layer SigLIP ViT)
  • License: AGPL-3.0
  • Base: Qwen3.6-35B-A3B โ†’ Opus 4.7 reasoning distill โ†’ Fable-5 agentic SFT

What's Included

File Type Size BPW
qwable-v1-nvfp4.gguf NVFP4 (all tensors) ~18.81 GB 4.55
mmproj-qwable-v1-f16.gguf Vision projector (F16) ~0.88 GB F16

Quantization Details

NVFP4

  • All weight tensors quantized to NVFP4 (4-bit floating point)
  • F32/F16 bias and norm tensors kept at original precision
  • Vision encoder and projector kept at F16 (not quantized)

Usage

# With llama.cpp (vision + text)
./llama-server -m qwable-v1-nvfp4.gguf --mmproj mmproj-qwable-v1-f16.gguf --host 0.0.0.0 --port 8080

# Text-only (no vision)
./llama-cli -m qwable-v1-nvfp4.gguf -p "Hello, how are you?"

Agentic Tool-Use

Qwable-v1 emits <tool_use> XML when prompted with an agent-style system prompt:

system: You are a coding agent. When you need to read, write, edit, or run code,
emit XML tool calls in this exact format:
<tool_use name="X" id="toolu_01abc">
{"...": "..."}
</tool_use>

Without the agent prompt, the model falls back to the Opus 4.7 reasoning prior (markdown code blocks).

Credits

Downloads last month
12
GGUF
Model size
36B params
Architecture
qwen35moe
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support