MiniMax-M3 · REAP-40 · JANG_2L

⚠️ Requires vMLX ≥ v1.5.62

Earlier vMLX builds contain a runtime cache bug that causes repetition loops on long output. This is an engine issue, not a weights issue — update vMLX to v1.5.62 or later before running this model. On v1.5.62+ generation is clean.

A space-efficient MiniMax-M3 bundle for Apple Silicon: 40 % REAP expert pruning + JANG_2L mixed-precision quantization, ~95 GB — the smaller, lower-RAM sibling of REAP-32 (more pruning, less resident memory). Runs comfortably on a single 128 GB Mac via vMLX / MLX.

What this is

  • Base: MiniMax-M3 (model_type=minimax_m3_vl) — MoE, GQA-4, MSA Lightning Indexer, vision tower.
  • Pruning: REAP saliency pruning, 40 % of routed experts removed (77 of 128 kept per MoE layer), highest-saliency experts retained.
  • Quantization (JANG_2L, affine, group size 64):
    tensor bits
    routed experts gate_proj / up_proj 2
    routed experts down_proj 3
    shared experts 6
    dense MLP (layers 0–2) 6
    attention q/k/v/o 8
    embeddings 6
    lm_head 8
    vision tower + projectors 8
    norms, router gate, MSA indexer fp16
    down_proj is kept at 3-bit (the rest of the routed experts are 2-bit) for stable long-form coherency. The full per-module bit map is written into config.json (quantization) and applied automatically by the loader.

Which one to use

  • REAP-40 (this repo, ~95 GB): smaller, lower resident RAM, more headroom.
  • REAP-32 (~105 GB): keeps more experts; higher quality ceiling, closer to the RAM limit on 128 GB machines.

Usage

Load in vMLX (v1.5.62+); the engine autodetects minimax_m3_vl and applies the correct settings (native MSA cache, paged cache off, per-module quant map). Sampling defaults ship in generation_config.json (temperature=1.0, top_p=0.95).

Attribution

  • Quantization & packaging: Jinho Jang · eric@jangq.ai
  • Base model © MiniMax, used under the MiniMax-M3 license.
Downloads last month
426
Safetensors
Model size
30B params
Tensor type
U32
·
F16
·
MLX
Hardware compatibility
Log In to add your hardware

2-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including JANGQ-AI/MiniMax-M3-REAP40-JANG_2L