Gemma-4-12B-it-heretic MLX 4-Bit

This model was converted to MLX format from igorls/gemma-4-12B-it-heretic using mlx-vlm.

It follows the same Gemma 4 MLX conversion strategy used by the public mlx-community / chia767 Gemma 4 12B ports: the model stays in the full gemma4_unified architecture, keeps the image/audio path, and uses mixed precision for the language model.

Why this variant?

The source checkpoint is about 22GB locally. This MLX build is about 10GB and preserves the unified Gemma 4 multimodal route instead of stripping it down to text-only inference.

Although this is tagged as 4-bit, it is not a pure all-layer 4-bit quantization. The default quantization is 4-bit affine with group size 64, while all 48 MLP gate_proj, up_proj, and down_proj layers are kept at 8-bit. The converter reported 7.355 bits per weight.

That is why this model is larger than a compact text-only MLX-LM 4-bit conversion, but much closer to the public Gemma 4 MLX ports in behavior and architecture.

Use with MLX-VLM

pip install -U mlx-vlm

python -m mlx_vlm.generate \
  --model IvanSmit05/gemma-4-12B-it-heretic-mlx-4bit \
  --max-tokens 100 \
  --temperature 0.0 \
  --prompt "Briefly describe this image." \
  --image <path_to_image>

For text-only prompts:

python -m mlx_vlm.generate \
  --model IvanSmit05/gemma-4-12B-it-heretic-mlx-4bit \
  --max-tokens 200 \
  --temperature 0.7 \
  --prompt "Write a short story about a rogue AI."