Instructions to use google/gemma-4-12B-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use google/gemma-4-12B-it with Transformers:
# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("google/gemma-4-12B-it") model = AutoModelForMultimodalLM.from_pretrained("google/gemma-4-12B-it") - Notebooks
- Google Colab
- Kaggle
vllm support?
does it run on vllm ?
Hi @mohamedemam
yes , please check these https://recipes.vllm.ai/Google/gemma-4-12B-it
https://docs.vllm.ai/projects/recipes/en/latest/Google/Gemma4.html
Thanks
i tried to qunatize it in 8bit and 4bit and qat models w4a16 but all give me same error i couldn't run the bf16
ngineCore pid=260562) ^^^^^^^^^^^^^^^^^
(EngineCore pid=260562) File "/root/.cache/vllm/torch_compile_cache/torch_aot_compile/57cb278580f583e642b8ddf152162e3ac7ebda35934706bfea4ee601dbd55f1f/inductor_cache/ty/ctykm2x3wxgvxbdzmxlcrwegdpwgd6hxzp5z4j5ctz6zbogstkyl.py", line 1372, in call
(EngineCore pid=260562) buf0 = torch.ops._C.marlin_gemm.default(reinterpret_tensor(arg0_1, (s70, 4096), (4096, 1), 0), None, arg3_1, None, arg4_1, None, None, arg5_1, arg6_1, arg7_1, arg8_1, 1125899907892224, s70, 3840, 8192, True, False, True, False)
(EngineCore pid=260562) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=260562) File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/torch/_ops.py", line 865, in call
(EngineCore pid=260562) return self._op(*args, **kwargs)
(EngineCore pid=260562) ^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=260562) File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/torch/_compile.py", line 54, in inner
(EngineCore pid=260562) return disable_fn(*args, **kwargs)
(EngineCore pid=260562) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=260562) File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1263, in _fn
(EngineCore pid=260562) return fn(*args, **kwargs)
(EngineCore pid=260562) ^^^^^^^^^^^^^^^^^^^
(EngineCore pid=260562) File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 409, in torch_dispatch
(EngineCore pid=260562) res = func(*args, **kwargs)
(EngineCore pid=260562) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=260562) File "/workspace/Cairoai/.venv-vllm/lib/python3.12/site-packages/torch/_ops.py", line 865, in call
(EngineCore pid=260562) return self._op(*args, **kwargs)
(EngineCore pid=260562) ^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=260562) RuntimeError: Shape mismatch: a.size(1) = 4096, size_k = 8192
[rank0]:[W608 14:18:26.673128869 ProcessGroupNCCL.cpp:1575] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
(APIServer pid=259701) Traceback (most recent call last):
(APIServer pid=259701) File "/workspace/Cairoai/.venv-vllm/bin/vllm", line 10, in