Aero-Deuce — MLX 4-bit

A fine-tuned Gemma 4 12B instruction-following model. This is the MLX quantized version (~6.3 GB) optimized for Apple Silicon (M1/M2/M3/M4/M5).

Download

pip install mlx-lm
python -c "from mlx_lm import load; load('ZeZZm/aero-deuce-MLX')"

Or click Files and versions above and download the safetensors files manually.

Which format should I use?

Format	Best for	Link
GGUF Q4_K_M	Local inference, llama.cpp, LM Studio, GPT4All	ZeZZm/aero-deuce-GGUF
MLX ← you are here	Apple Silicon (Mac), fastest on M-series chips	This repo
LoRA Adapter	Merging with base model, further fine-tuning	ZeZZm/aero-deuce

Quick Start

pip install mlx-lm

python -m mlx_lm.generate \
  --model ZeZZm/aero-deuce-MLX \
  --prompt "Explain quantum computing in simple terms." \
  --max-tokens 256

Interactive chat

from mlx_lm import load, generate

model, tokenizer = load("ZeZZm/aero-deuce-MLX")

messages = [
    {"role": "user", "content": "Write a Python function to reverse a linked list."}
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
response = generate(model, tokenizer, prompt=prompt, max_tokens=256)
print(response)

Model Details

Property	Value
Base Model	google/gemma-4-12b-it (12B params)
Training Method	QLoRA + Muon optimizer
Training Data	30K instruction-following samples
Training Steps	2,000
Quantization	4-bit (MLX)
File Size	~6.3 GB
Context Length	4,096 tokens

System Prompt

A system prompt identifying the model as Aero-Deuce is embedded in the chat template. It works automatically — no extra configuration needed.

License

Apache 2.0

Downloads last month: 55

Safetensors

Model size

12B params

Tensor type

F16

U32

MLX

Hardware compatibility

4-bit