Aero-Deuce — MLX 4-bit

A fine-tuned Gemma 4 12B instruction-following model. This is the MLX quantized version (~6.3 GB) optimized for Apple Silicon (M1/M2/M3/M4/M5).

Download

pip install mlx-lm
python -c "from mlx_lm import load; load('ZeZZm/aero-deuce-MLX')"

Or click Files and versions above and download the safetensors files manually.

Which format should I use?

Format Best for Link
GGUF Q4_K_M Local inference, llama.cpp, LM Studio, GPT4All ZeZZm/aero-deuce-GGUF
MLX ← you are here Apple Silicon (Mac), fastest on M-series chips This repo
LoRA Adapter Merging with base model, further fine-tuning ZeZZm/aero-deuce

Quick Start

pip install mlx-lm

python -m mlx_lm.generate \
  --model ZeZZm/aero-deuce-MLX \
  --prompt "Explain quantum computing in simple terms." \
  --max-tokens 256

Interactive chat

from mlx_lm import load, generate

model, tokenizer = load("ZeZZm/aero-deuce-MLX")

messages = [
    {"role": "user", "content": "Write a Python function to reverse a linked list."}
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
response = generate(model, tokenizer, prompt=prompt, max_tokens=256)
print(response)

Model Details

Property Value
Base Model google/gemma-4-12b-it (12B params)
Training Method QLoRA + Muon optimizer
Training Data 30K instruction-following samples
Training Steps 2,000
Quantization 4-bit (MLX)
File Size ~6.3 GB
Context Length 4,096 tokens

System Prompt

A system prompt identifying the model as Aero-Deuce is embedded in the chat template. It works automatically — no extra configuration needed.

License

Apache 2.0

Downloads last month
55
Safetensors
Model size
12B params
Tensor type
F16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support