NobodyWho/Qwen_Qwen3.5-4B-GGUF

Overview

GGUF quantization of Qwen3.5-4B, prepared for NobodyWho: it works with NobodyWho out of the box, with Qwen's recommended sampling metadata embedded in every quant, and is verified with NobodyWho's test suite. Qwen3.5 is Alibaba's latest small-model family — natively multimodal (text + image), with strong reasoning and best-in-class tool calling for its size.

Model Capabilities

  • Text generation — instruction-following chat
  • Tool calling — native function calling with grammar-constrained output (14/14 on NobodyWho's suite)
  • Vision — image understanding via the companion mmproj-BF16.gguf projection model
  • Reasoning — thinking mode (on by default)
  • Long context — up to 256k tokens
  • Multilingual — broad language coverage

Available Quantizations

File Approach Tool-calling tests
Qwen_Qwen3.5-4B-BF16-vendor-sampling.gguf Vendor sampling injected 14/14
Qwen_Qwen3.5-4B-Q8_0-vendor-sampling.gguf Vendor sampling injected 14/14
Qwen_Qwen3.5-4B-Q4_K_M-vendor-sampling.gguf Vendor sampling injected 14/14
mmproj-BF16.gguf Vision projection (use with any of the above)

Verified with NobodyWho's tool-calling suite across BF16 / Q8_0 / Q4_K_M (14/14 each, June 2026); vision, reasoning, and multilingual verified. The upstream GGUF has no general.sampling.* metadata, so the -vendor-sampling files embed Qwen's recommended sampler (see INJECTION.md).

Quick Start

Using the NobodyWho library:

from nobodywho import Chat

chat = Chat("huggingface:NobodyWho/Qwen_Qwen3.5-4B-GGUF/Qwen_Qwen3.5-4B-Q4_K_M-vendor-sampling.gguf")
response = chat.ask("What is the capital of Denmark?").completed()
print(response)  # The capital of Denmark is Copenhagen.

Vision

from nobodywho import Model, Chat, Prompt, Image, Text

model = Model(
    "huggingface:NobodyWho/Qwen_Qwen3.5-4B-GGUF/Qwen_Qwen3.5-4B-Q4_K_M-vendor-sampling.gguf",
    projection_model_path="huggingface:NobodyWho/Qwen_Qwen3.5-4B-GGUF/mmproj-BF16.gguf",
)
chat = Chat(model=model, system_prompt="You are a helpful assistant.")
response = chat.ask(Prompt([
    Text("What is in this image?"),
    Image("./photo.png"),
])).completed()
print(response)

llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
    repo_id="NobodyWho/Qwen_Qwen3.5-4B-GGUF",
    filename="Qwen_Qwen3.5-4B-Q4_K_M-vendor-sampling.gguf",
)

Model Specifications

  • Parameters: 4B
  • Context length: 262,144 tokens (256K)
  • License: Apache 2.0
  • Base model: Qwen/Qwen3.5-4B
  • Architecture: qwen35 (vision-capable)

Licensing / Credits

Licensed under Apache 2.0 (unchanged from upstream). All model credit belongs to the Qwen team, Alibaba Group. GGUF quantizations provided by unsloth.

Downloads last month
236
GGUF
Model size
4B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

4-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for NobodyWho/Qwen_Qwen3.5-4B-GGUF

Finetuned
Qwen/Qwen3.5-4B
Quantized
(250)
this model

Collection including NobodyWho/Qwen_Qwen3.5-4B-GGUF