MLX 8-Bit Quantized: Gemma-4-12B-Coder

This repository contains an 8-bit MLX-converted version of yuxinlu1/gemma-4-12B-coder-fable5-composer2.5-v1.

The model has been quantized to 8-bit to dramatically reduce memory requirements while retaining near-lossless reasoning and coding capabilities. It is optimized for local inference on Apple Silicon Macs using the mlx-lm library.

How to Use with MLX

Install the required dependency:

pip install --upgrade mlx-lm

Run inference from Python:

from mlx_lm import load, generate

# Load the 8-bit quantized MLX model.
model, tokenizer = load("nypswift/gemma-4-12b-coder-fable5-composer2.5-mlx-8bit")

prompt = "Write a Python script to sort a dictionary by its values."
messages = [{"role": "user", "content": prompt}]

formatted_prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)

response = generate(
    model,
    tokenizer,
    prompt=formatted_prompt,
    verbose=True,
    max_tokens=1024,
)

Base and License

Free to use, modify, and redistribute under the Apache 2.0 license.

Downloads last month
-
Safetensors
Model size
12B params
Tensor type
F16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nypswift/Gemma-4-12b-coder-fable5-composer2.5-MLX-8bit