Update

Added a Jinja chat template so the model can format conversations correctly and work smoothly with mlx-lm chat-style inference.

MLX: Gemma-4-12B-Coder

This repository contains a non-quantized MLX-converted version of yuxinlu1/gemma-4-12B-coder-fable5-composer2.5-v1.

The model has been converted to MLX format for local inference on Apple Silicon Macs using the mlx-lm library.

How to Use with MLX

Install the required dependency:

pip install --upgrade mlx-lm

Run inference from Python:

from mlx_lm import load, generate

# Load the non-quantized MLX model.
model, tokenizer = load("mlx-community/gemma-4-12b-coder-fable5-composer2.5")

prompt = "Write a Python script to sort a dictionary by its values."
messages = [{"role": "user", "content": prompt}]

formatted_prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)

response = generate(
    model,
    tokenizer,
    prompt=formatted_prompt,
    verbose=True,
    max_tokens=1024,
)
response = generate(
    model,
    tokenizer,
    prompt=formatted_prompt,
    verbose=True,
    max_tokens=1024,
    temp=0.0,
)

Base and License

Free to use, modify, and redistribute under the Apache 2.0 license.

Downloads last month
115
Safetensors
Model size
12B params
Tensor type
F16
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mlx-community/gemma-4-12b-coder-fable5-composer2.5

Collection including mlx-community/gemma-4-12b-coder-fable5-composer2.5