NobodyWho/Google_Gemma4-12B-GGUF

Overview

GGUF quantization of Google's Gemma 4 12B (Unified) model, re-hosted for NobodyWho. The unsloth build already ships a tool-calling setup and recommended sampling metadata (general.sampling: temp 1.0, top_k 64, top_p 0.95), so nothing needs patching — the model is verified with NobodyWho's test suite. The 12B Unified variant is the laptop-class Gemma 4 — stronger reasoning and multimodal capability than the edge (E2B/E4B) models while staying well below the larger MoE/dense variants in memory. Multimodal (text + image), multilingual, Apache 2.0.

Model Capabilities

  • Text generation — instruction-following chat, stronger reasoning
  • Tool calling — native function calling with grammar-constrained output
  • Vision — ⚠️ the 12B mmproj (vision + audio encoder) currently fails to load in NobodyWho (llama.cpp MTMD/CLIP init error); needs a newer llama.cpp. Text + tool calling are unaffected. For vision today, use Gemma 4 E2B/E4B (verified working)
  • Long context — 256k tokens
  • Multilingual — 140+ languages

Available Quantizations

File Approach Tool-calling tests
gemma-4-12b-it-BF16.gguf Sampling embedded upstream not separately run
gemma-4-12b-it-Q8_0.gguf Sampling embedded upstream 14/14
gemma-4-12b-it-Q4_K_M.gguf Sampling embedded upstream 14/14
mmproj-BF16.gguf Vision projection — ⚠️ does not load in NobodyWho yet

Tool calling verified on Q8_0 and Q4_K_M (14/14 each, June 2026; BF16 hosted but not separately tested — 24 GB). Vision: the 12B mmproj fails to load in the current NobodyWho build (llama.cpp MTMD/CLIP init error) — Gemma 4 E2B/E4B vision is verified working. Quant names follow the unsloth gemma-4-12b-it-GGUF repo.

Quick Start

Using the NobodyWho library:

from nobodywho import Chat

chat = Chat("huggingface:NobodyWho/Google_Gemma4-12B-GGUF/gemma-4-12b-it-Q4_K_M.gguf")
response = chat.ask("What is the capital of Denmark?").completed()
print(response)  # The capital of Denmark is Copenhagen.

Vision

⚠️ Not working yet on 12B: the mmproj fails to load in the current NobodyWho build. The snippet below is the intended API (it works for Gemma 4 E2B/E4B today).

from nobodywho import Model, Chat, Prompt, Image, Text

model = Model(
    "huggingface:NobodyWho/Google_Gemma4-12B-GGUF/gemma-4-12b-it-Q4_K_M.gguf",
    projection_model_path="huggingface:NobodyWho/Google_Gemma4-12B-GGUF/mmproj-BF16.gguf",
)
chat = Chat(model=model, system_prompt="You are a helpful assistant.")
response = chat.ask(Prompt([
    Text("What is in this image?"),
    Image("./photo.png"),
])).completed()
print(response)

llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
    repo_id="NobodyWho/Google_Gemma4-12B-GGUF",
    filename="gemma-4-12b-it-Q4_K_M.gguf",
)

Model Specifications

  • Parameters: 12B (Unified)
  • Context length: 262,144 tokens (256K)
  • License: Apache 2.0
  • Base model: google/gemma-4-12B
  • Architecture: gemma4 (vision-capable)

Licensing / Credits

Licensed under Apache 2.0 (unchanged from upstream). All model credit belongs to Google DeepMind. GGUF quantizations provided by unsloth.

Downloads last month
343
GGUF
Model size
12B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

4-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for NobodyWho/Google_Gemma4-12B-GGUF

Quantized
(31)
this model

Collection including NobodyWho/Google_Gemma4-12B-GGUF