Gemma-4-12B-IT-ASSISTANT GGUF

Experimental GGUF quantized versions of the gemma-4-12B-it-assistant from google. This model contains the mtp draft heads and is meant to be used as the speculative decoding model alongside the main model. It's likely not functional (yet?) with llama.cpp but may be useful if you are developing a branch aiming to add gemma-4-12b mtp support.

Available variants

  • gemma-4-12b-assistant-bf16.gguf - Full BF16 precision
  • gemma-4-12b-assistant-q4_k_m.gguf - Q4_K_M quantized
  • gemma-4-12b-assistant-q5_k_m.gguf - Q5_K_M quantized
  • gemma-4-12b-assistant-q6_k.gguf - Q6_K quantized
  • gemma-4-12b-assistant-q8_0.gguf - Q8_0 quantized
Downloads last month
323
GGUF
Model size
0.4B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for cloudnathan5/gemma-4-12B-it-assistant-GGUF

Quantized
(13)
this model