Instructions to use LLMWildling/gemma-4-120b-a12b-coder with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use LLMWildling/gemma-4-120b-a12b-coder with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="LLMWildling/gemma-4-120b-a12b-coder")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("LLMWildling/gemma-4-120b-a12b-coder")
model = AutoModelForMultimodalLM.from_pretrained("LLMWildling/gemma-4-120b-a12b-coder")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use LLMWildling/gemma-4-120b-a12b-coder with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "LLMWildling/gemma-4-120b-a12b-coder"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LLMWildling/gemma-4-120b-a12b-coder",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/LLMWildling/gemma-4-120b-a12b-coder

SGLang

How to use LLMWildling/gemma-4-120b-a12b-coder with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "LLMWildling/gemma-4-120b-a12b-coder" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LLMWildling/gemma-4-120b-a12b-coder",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "LLMWildling/gemma-4-120b-a12b-coder" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LLMWildling/gemma-4-120b-a12b-coder",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use LLMWildling/gemma-4-120b-a12b-coder with Docker Model Runner:
```
docker model run hf.co/LLMWildling/gemma-4-120b-a12b-coder
```

Confused about tags - Is this model text-only or multimodal?

by bandageshi - opened 13 days ago

Discussion

bandageshi

13 days ago

Hi! I'm a beginner and a bit confused about this model's capabilities.

The tags on the page say image-text-to-text, but in your serving instructions, the vLLM command uses --language-model-only. Also, I noticed there's no preprocessor_config.json in the files.

Could you clarify if this model supports image inputs, or if it's purely for text?

Thanks for the cool model!

LLMWildling

Owner 6 days ago

Hi @banageshi . All of these models are trained for mxfp4 text + reasoning. Some of these are my agents auto uploaded test commands after evals. I did not touch any vision but it should work

Let me know if I can help out

bandageshi

6 days ago

I just checked the model.safetensors.index.json and noticed that the vision layers are indeed still there. So theoretically, could I just copy over the preprocessor_config.json from the official Gemma 4 model to get the vision features working? Sounds a bit risky though lol.

By the way, I think this model is really cool! I've been hoping Google would release a mid-sized MoE model like gpt-oss-120b, but it's pretty clear they don't want open-source models eating into Gemini's lunch.

I'll definitely play around with it and give it a try. An NVFP4 quantized version would be awesome if you ever plan on making one!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment