Gemma 4 E4B Instruct — 4-bit MLX

4-bit MLX quantization of google/gemma-4-E4B-it, for Apple Silicon (~4.9 GB). Vision-language model — run it with mlx-vlm, not mlx-lm.

Usage

pip install -U mlx-vlm

python -m mlx_vlm.generate \
  --model TyKaoz/gemma-4-E4B-it-4bit \
  --prompt "Explique la quantization en une phrase." \
  --max-tokens 200

Base	Tool	Precision	Size
`google/gemma-4-E4B-it`	`mlx-vlm`	4-bit · group 64	~4.9 GB

By TyKaoz — privacy-first native macOS LLM chat client. Apache 2.0, inherited from the base model.

Safetensors

Model size

2B params

Tensor type

BF16

U32

MLX

Hardware compatibility

4-bit

Base model

Finetuned

Quantized

(224)

this model