Gemma 4 12B Instruction-Tuned — GGUF (multimodal)

Community GGUF mirror of google/gemma-4-12B-it for local, encoder-free multimodal AI on consumer hardware (~16 GB VRAM).

Announced June 2026: Google blog · Developer guide

Parameters ~12B dense
Modalities Text, vision, audio (native in backbone)
License Apache 2.0
Architecture Encoder-free (no separate vision/audio towers)
Context See upstream config
Vision in GGUF Requires mmproj-*.gguf alongside main weights

Why this repo exists

  • One download hub for all major quants (K-quants, IQ, Q8, mmproj).
  • Fast Hub-side sync from bartowski/gemma-4-12B-it-GGUF — no re-upload from your laptop.
  • Documented use cases for contributors: gemma-4-12b-local (agents, LiteRT, llama.cpp, MLX).

Available files

See gguf-manifest.json for the live file list.

Essential tier (recommended)

File Use
gemma-4-12B-it-Q4_K_M.gguf Best balance — 16 GB laptops
gemma-4-12B-it-Q5_K_M.gguf Higher quality
gemma-4-12B-it-Q6_K.gguf / Q8_0 Max quality
gemma-4-12B-it-Q3_K_M.gguf Tighter VRAM
gemma-4-12B-it-Q2_K.gguf Minimum size
gemma-4-12B-it-IQ4_XS.gguf / IQ4_NL.gguf IQ variants
mmproj-gemma-4-12B-it-f16.gguf Required for images in llama.cpp

Full tier

All bartowski quants (Q2_K_L, Q3_K_XL, Q4_0, Q4_1, bf16, imatrix, etc.) — run make sync-gemma4-gguf-full.

Download

pip install -U huggingface_hub

# Text + vision (recommended)
huggingface-cli download Edmon02/gemma-4-12B-it-GGUF \
  gemma-4-12B-it-Q4_K_M.gguf \
  mmproj-gemma-4-12B-it-f16.gguf \
  --local-dir ./models/gemma-4-12b

Accept the license on google/gemma-4-12B-it before using weights.

Quick start

llama.cpp (text)

llama-cli -m gemma-4-12B-it-Q4_K_M.gguf -p "Explain encoder-free multimodal models in 3 bullets." -n 256

llama.cpp (image + text)

llama-mtmd-cli \
  -m gemma-4-12B-it-Q4_K_M.gguf \
  --mmproj mmproj-gemma-4-12B-it-f16.gguf \
  --image photo.jpg \
  -p "Describe this image."

LiteRT-LM (OpenAI-compatible local server)

litert-lm import --from-huggingface-repo=litert-community/gemma-4-12B-it-litert-lm gemma-4-12B-it.litertlm gemma4-12b
litert-lm serve

LM Studio / Ollama

Import Edmon02/gemma-4-12B-it-GGUF and select Q4_K_M + mmproj.

Use cases

Use case Quant Tool
Local coding agent Q4_K_M OpenCode, Continue, Aider
Voice + vision assistant Q5_K_M + mmproj Google AI Edge Gallery / Eloquent (Mac)
Armenian + English research Q4_K_M Pair with HyVoxPopuli ASR/TTS
Low-VRAM laptop Q3_K_M or IQ4_XS llama.cpp
Fast inference MTP drafter (upstream) Google checkpoint + compatible runtime

Hardware guide

VRAM Suggested files
8 GB IQ4_XS or Q3_K_M (text only, short context)
16 GB Q4_K_M + mmproj
24 GB+ Q6_K or Q8_0 + mmproj

Provenance

Item Source
Base model google/gemma-4-12B-it
GGUF quants Mirrored from bartowski/gemma-4-12B-it-GGUF
Maintainer scripts Edmon02/audio_set scripts/sync_gemma4_gguf_quants.py

Limitations

  • Community quants — validate quality on your tasks vs official BF16.
  • Audio in GGUF may require latest llama.cpp / LM Studio builds.
  • Gated upstream — HF token + license acceptance required for google/* repos.

Contributing

Add recipes under projects/gemma-4-12b-local/examples/. See CONTRIBUTING.md in that folder.

Citation

@article{gemma_2026,
  title={Gemma 4},
  author={Google DeepMind},
  year={2026},
  url={https://blog.google/innovation-and-ai/technology/developers-tools/introducing-gemma-4-12B/}
}
Downloads last month
5,028
GGUF
Model size
12B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Edmon02/gemma-4-12B-it-GGUF

Quantized
(133)
this model