ToPo-ToPo/gemma-4-26B-A4B-it-mlx-8bit

MLX 8bit conversion of google/gemma-4-26b-a4b-it (Gemma 4 26B-A4B, MoE / 128 experts), made with mlx-vlm 0.6.3.

Provenance (self-converted)

Source: google/gemma-4-26b-a4b-it (license: gemma)
Quantization: 8-bit (group_size=64, ~8.67 bpw)
Tool: mlx-vlm 0.6.3 mlx_vlm.convert (experts stored fused in source; no patch needed)
Validated: loads and translates correctly under mlx-vlm 0.6.3.

Usage

from mlx_vlm import load
model, processor = load("ToPo-ToPo/gemma-4-26B-A4B-it-mlx-8bit")

License

Derivative of Google Gemma; governed by the Gemma Terms of Use (https://ai.google.dev/gemma/terms) and Prohibited Use Policy. Converted to MLX.

⚡ Faster generation with MTP (speculative decoding, lossless)

Recommended drafter: google/gemma-4-26B-A4B-it-assistant — Google's official MTP drafter for this model. It loads directly in mlx-vlm (no conversion needed) and gives up to ~3x faster generation (≈1.4–1.5x measured on short prompts); output is identical to non-MTP decoding.

# requires:  pip install "mlx-vlm>=0.6.3"
from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config

model, processor = load("ToPo-ToPo/gemma-4-26B-A4B-it-mlx-8bit")
draft_model, _   = load("google/gemma-4-26B-A4B-it-assistant")
config = load_config("ToPo-ToPo/gemma-4-26B-A4B-it-mlx-8bit")

prompt = apply_chat_template(processor, config, "Hello!", num_images=0)
out = generate(model, processor, prompt,
               draft_model=draft_model, draft_kind="mtp", max_tokens=256)

CLI (draft_kind auto-detected): mlx_vlm.generate --model ToPo-ToPo/gemma-4-26B-A4B-it-mlx-8bit --draft-model google/gemma-4-26B-A4B-it-assistant

Notes

draft_kind="mtp" is required in the Python API (the CLI auto-detects it).
Use this model's own drafter above — drafters are size-specific and not interchangeable across Gemma 4 variants.
Needs mlx-vlm >= 0.6.3. MTP is lossless — if output differs from non-MTP, your versions are mismatched.

Downloads last month: 49

Safetensors

Model size

8B params

Tensor type

BF16

U32

MLX

Hardware compatibility

8-bit

Collection including ToPo-ToPo/gemma-4-26B-A4B-it-mlx-8bit

Gemma4

Collection

自分でmlxに変換したgemma4シリーズ • 10 items • Updated 3 days ago