Instructions to use mlx-community/gemma-4-12B-it-OptiQ-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use mlx-community/gemma-4-12B-it-OptiQ-4bit with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("mlx-community/gemma-4-12B-it-OptiQ-4bit") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- Pi
How to use mlx-community/gemma-4-12B-it-OptiQ-4bit with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "mlx-community/gemma-4-12B-it-OptiQ-4bit"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "mlx-community/gemma-4-12B-it-OptiQ-4bit" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use mlx-community/gemma-4-12B-it-OptiQ-4bit with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "mlx-community/gemma-4-12B-it-OptiQ-4bit"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default mlx-community/gemma-4-12B-it-OptiQ-4bit
Run Hermes
hermes
- MLX LM
How to use mlx-community/gemma-4-12B-it-OptiQ-4bit with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "mlx-community/gemma-4-12B-it-OptiQ-4bit"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "mlx-community/gemma-4-12B-it-OptiQ-4bit" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mlx-community/gemma-4-12B-it-OptiQ-4bit", "messages": [ {"role": "user", "content": "Hello"} ] }'
MLX_LM valueError: Model type gemma4_unified not supported.
I’m trying to run the model mlx-community/gemma-4-12B-it-OptiQ-4bit as the model card suggests and I’m getting a ValueError: Model type gemma4_unified not supported.
My MLX version is 0.31.2 and MLX-LM is 0.31.3
And if I check the registered Gemma Models with the following terminal command;
ls /opt/anaconda3/lib/python3.12/site-packages/mlx_lm/models | grep gemma
The list is limited to the following versions, with no mentioned of the model_type gemma4_unified
gemma.py
gemma2.py
gemma3_text.py
gemma3.py
gemma3n.py
gemma4_text.py
gemma4.py
recurrent_gemma.py
I’m hoping there is someone out there that could point me in the right direction to resolve this issue?
Thanking you in advance for your time.
same here!!
Thanks for the clear report, and sorry about the misleading model card.
The gemma-4-12B is the unified Gemma-4 variant (model_type: gemma4_unified), which needs two things the card didn't mention:
1. mlx-lm from main, not the 0.31.3 PyPI release. The unified text-tower support (MoE / double-wide MLP / global_head_dim / attention_k_eq_v) landed after 0.31.3. The catch: the main build also reports version 0.31.3, so a version check won't tell them apart — you have to install it from git, not pin a version:
pip install -U "mlx-lm @ git+https://github.com/ml-explore/mlx-lm.git" mlx-optiq
2. import optiq before loading. mlx-lm has no gemma4_unified model type (that's why your ls shows gemma4.py / gemma4_text.py but nothing unified). mlx-optiq registers it (aliases gemma4_unified -> mlx-lm's gemma4 class) at import time:
import optiq # registers the gemma4_unified model type
from mlx_lm import load, generate
model, tok = load("mlx-community/gemma-4-12B-it-OptiQ-4bit")
print(generate(model, tok, "Hello", max_tokens=64))
Only the 12B is affected (it's the unified variant). The other gemma-4 OptIQ quants (e2b, e4b, 26B-A4B, 31B) are plain gemma4 and load fine with stock mlx-lm.
We're updating the card to say this. Thanks for flagging it!
Thank you for taking the time and effort to respond to my request, greatly appreciated.