How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="NobodyWho/Google_Gemma4-E2B-GGUF",
	filename="",
)
llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": [
				{
					"type": "text",
					"text": "Describe this image in one sentence."
				},
				{
					"type": "image_url",
					"image_url": {
						"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
					}
				}
			]
		}
	]
)

NobodyWho/Google_Gemma4-E2B-GGUF

Overview

GGUF quantization of Google's Gemma 4 E2B instruction-tuned model, re-hosted for NobodyWho. The unsloth build already ships a tool-calling setup and recommended sampling metadata (general.sampling: temp 1.0, top_k 64, top_p 0.95), so nothing needs patching — the model is verified with NobodyWho's test suite. E2B is the smallest, most on-device-friendly Gemma 4 variant — multimodal (text + image), multilingual, and Apache 2.0 licensed.

Model Capabilities

  • Text generation — instruction-following chat
  • Tool calling — native function calling with grammar-constrained output
  • Vision — image understanding via the companion mmproj-BF16.gguf projection model
  • Long context — 128k tokens
  • Multilingual — 140+ languages

Available Quantizations

File Approach Tool-calling tests
gemma-4-E2B-it-BF16.gguf Sampling embedded upstream 14/14
gemma-4-E2B-it-Q8_0.gguf Sampling embedded upstream 14/14
gemma-4-E2B-it-Q4_K_M.gguf Sampling embedded upstream 14/14
mmproj-BF16.gguf Vision projection (use with any of the above)

Verified with NobodyWho's tool-calling suite across BF16 / Q8_0 / Q4_K_M (14/14 each, June 2026); vision and multilingual verified per-model. Quant names follow the unsloth gemma-4-E2B-it-GGUF repo.

Quick Start

Using the NobodyWho library:

from nobodywho import Chat

chat = Chat("huggingface:NobodyWho/Google_Gemma4-E2B-GGUF/gemma-4-E2B-it-Q4_K_M.gguf")
response = chat.ask("What is the capital of Denmark?").completed()
print(response)  # The capital of Denmark is Copenhagen.

Vision

from nobodywho import Model, Chat, Prompt, Image, Text

model = Model(
    "huggingface:NobodyWho/Google_Gemma4-E2B-GGUF/gemma-4-E2B-it-Q4_K_M.gguf",
    projection_model_path="huggingface:NobodyWho/Google_Gemma4-E2B-GGUF/mmproj-BF16.gguf",
)
chat = Chat(model=model, system_prompt="You are a helpful assistant.")
response = chat.ask(Prompt([
    Text("What is in this image?"),
    Image("./photo.png"),
])).completed()
print(response)

llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
    repo_id="NobodyWho/Google_Gemma4-E2B-GGUF",
    filename="gemma-4-E2B-it-Q4_K_M.gguf",
)

Model Specifications

  • Parameters: ≈2.3B effective (E2B)
  • Context length: 131,072 tokens
  • License: Apache 2.0
  • Base model: google/gemma-4-E2B-it
  • Architecture: gemma4 (vision-capable)

Licensing / Credits

Licensed under Apache 2.0 (unchanged from upstream). All model credit belongs to Google DeepMind. GGUF quantizations provided by unsloth.

Downloads last month
138
GGUF
Model size
5B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

4-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for NobodyWho/Google_Gemma4-E2B-GGUF

Quantized
(238)
this model

Collection including NobodyWho/Google_Gemma4-E2B-GGUF