Instructions to use IvanSmit05/gemma-4-12B-it-heretic-mlx-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use IvanSmit05/gemma-4-12B-it-heretic-mlx-4bit with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir gemma-4-12B-it-heretic-mlx-4bit IvanSmit05/gemma-4-12B-it-heretic-mlx-4bit
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
Gemma-4-12B-it-heretic MLX 4-Bit
This model was converted to MLX format from igorls/gemma-4-12B-it-heretic using mlx-vlm.
It follows the same Gemma 4 MLX conversion strategy used by the public mlx-community / chia767 Gemma 4 12B ports: the model stays in the full gemma4_unified architecture, keeps the image/audio path, and uses mixed precision for the language model.
Why this variant?
The source checkpoint is about 22GB locally. This MLX build is about 10GB and preserves the unified Gemma 4 multimodal route instead of stripping it down to text-only inference.
Although this is tagged as 4-bit, it is not a pure all-layer 4-bit quantization. The default quantization is 4-bit affine with group size 64, while all 48 MLP gate_proj, up_proj, and down_proj layers are kept at 8-bit. The converter reported 7.355 bits per weight.
That is why this model is larger than a compact text-only MLX-LM 4-bit conversion, but much closer to the public Gemma 4 MLX ports in behavior and architecture.
Use with MLX-VLM
pip install -U mlx-vlm
python -m mlx_vlm.generate \
--model IvanSmit05/gemma-4-12B-it-heretic-mlx-4bit \
--max-tokens 100 \
--temperature 0.0 \
--prompt "Briefly describe this image." \
--image <path_to_image>
For text-only prompts:
python -m mlx_vlm.generate \
--model IvanSmit05/gemma-4-12B-it-heretic-mlx-4bit \
--max-tokens 200 \
--temperature 0.7 \
--prompt "Write a short story about a rogue AI."
Conversion details
- Source model: igorls/gemma-4-12B-it-heretic
- Original base model: google/gemma-4-12B-it
- MLX library:
mlx-vlm - Architecture:
gemma4_unified - Pipeline:
any-to-any - Quantization: 4-bit affine, group size 64
- Mixed precision: all language-model MLP projections kept at 8-bit
- Reported quantization density: 7.355 bits per weight
- Local safetensor size: 10.233 GiB
Refer to the original model card for details on the Heretic abliteration process, behavior, licensing, and usage notes.
- Downloads last month
- 1,831
4-bit
Model tree for IvanSmit05/gemma-4-12B-it-heretic-mlx-4bit
Base model
google/gemma-4-12B