Instructions to use google/gemma-4-12B-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use google/gemma-4-12B-it with Transformers:
# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("google/gemma-4-12B-it") model = AutoModelForMultimodalLM.from_pretrained("google/gemma-4-12B-it") - Notebooks
- Google Colab
- Kaggle
How to use an assistent model for MTP in llama.cpp?
#29
by Regrin - opened
How to use an assistent model for MTP in llama.cpp?
My example is from 31b but should be same for every model in gemma 4 family
--model gemma-4-31B-it-qat-UD-Q4_K_XL.gguf
--model-draft gemma-4-31b-it-qat-q4_0-assistant.gguf
--spec-type draft-mtp
--spec-draft-n-max 4
Tune --spec-draft-* stuff to your liking.
ggufs should be available around huggingface, people already converted them.
Thanks!