NobodyWho/Qwen_Qwen3.6-27B-GGUF

Overview

GGUF quantization of Qwen3.6-27B, prepared for NobodyWho: it works with NobodyWho out of the box, with Qwen's recommended sampling metadata embedded in every quant, and is verified with NobodyWho's test suite. Qwen3.6-27B is Alibaba's dense flagship — natively multimodal (text + image), with strong reasoning and best-in-class agentic tool calling (it rivals far larger MoE models on agentic-coding benchmarks).

Model Capabilities

  • Text generation — instruction-following chat
  • Tool calling — native function calling with grammar-constrained output (14/14 on NobodyWho's suite)
  • Vision — image understanding via the companion mmproj-BF16.gguf projection model
  • Reasoning — thinking mode (on by default)
  • Long context — up to 256k tokens
  • Multilingual — broad language coverage

Available Quantizations

File Approach Tool-calling tests
Qwen_Qwen3.6-27B-Q3_K_M-vendor-sampling.gguf Vendor sampling injected 14/14
Qwen_Qwen3.6-27B-Q4_K_M-vendor-sampling.gguf Vendor sampling injected 14/14
Qwen_Qwen3.6-27B-Q8_0-vendor-sampling.gguf Vendor sampling injected not separately run (RAM)
mmproj-BF16.gguf Vision projection (use with any of the above)

Verified with NobodyWho's suite — tool calling 14/14 on Q3_K_M and Q4_K_M, vision on Q3_K_M (June 2026); Q8_0 not separately run (RAM). Q3_K_M (≈13.6 GB) is the comfortable fit on 24 GB; Q4_K_M (≈16.8 GB) also runs on 24 GB but is tight — it swaps and runs slower (tested 14/14); Q8_0 (≈28.6 GB) wants 32 GB+. BF16 (≈54 GB) is not hosted. The upstream GGUF has no general.sampling.* metadata, so all quants embed Qwen's recommended sampler (see INJECTION.md).

Quick Start

Using the NobodyWho library:

from nobodywho import Chat

chat = Chat("huggingface:NobodyWho/Qwen_Qwen3.6-27B-GGUF/Qwen_Qwen3.6-27B-Q3_K_M-vendor-sampling.gguf")
response = chat.ask("What is the capital of Denmark?").completed()
print(response)  # The capital of Denmark is Copenhagen.

Vision

from nobodywho import Model, Chat, Prompt, Image, Text

model = Model(
    "huggingface:NobodyWho/Qwen_Qwen3.6-27B-GGUF/Qwen_Qwen3.6-27B-Q3_K_M-vendor-sampling.gguf",
    projection_model_path="huggingface:NobodyWho/Qwen_Qwen3.6-27B-GGUF/mmproj-BF16.gguf",
)
chat = Chat(model=model, system_prompt="You are a helpful assistant.")
response = chat.ask(Prompt([
    Text("What is in this image?"),
    Image("./photo.png"),
])).completed()
print(response)

llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
    repo_id="NobodyWho/Qwen_Qwen3.6-27B-GGUF",
    filename="Qwen_Qwen3.6-27B-Q3_K_M-vendor-sampling.gguf",
)

Model Specifications

  • Parameters: 27B (dense)
  • Context length: 262,144 tokens (256K)
  • License: Apache 2.0
  • Base model: Qwen/Qwen3.6-27B
  • Architecture: qwen35 (vision-capable)

Licensing / Credits

Licensed under Apache 2.0 (unchanged from upstream). All model credit belongs to the Qwen team, Alibaba Group. GGUF quantizations provided by unsloth.

Downloads last month
49
GGUF
Model size
27B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

3-bit

4-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for NobodyWho/Qwen_Qwen3.6-27B-GGUF

Base model

Qwen/Qwen3.6-27B
Quantized
(487)
this model

Collection including NobodyWho/Qwen_Qwen3.6-27B-GGUF