Instructions to use JANGQ-AI/MiniMax-M3-Medium-JANG_2L with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use JANGQ-AI/MiniMax-M3-Medium-JANG_2L with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("JANGQ-AI/MiniMax-M3-Medium-JANG_2L") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- Pi
How to use JANGQ-AI/MiniMax-M3-Medium-JANG_2L with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "JANGQ-AI/MiniMax-M3-Medium-JANG_2L"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "JANGQ-AI/MiniMax-M3-Medium-JANG_2L" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use JANGQ-AI/MiniMax-M3-Medium-JANG_2L with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "JANGQ-AI/MiniMax-M3-Medium-JANG_2L"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default JANGQ-AI/MiniMax-M3-Medium-JANG_2L
Run Hermes
hermes
- MLX LM
How to use JANGQ-AI/MiniMax-M3-Medium-JANG_2L with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "JANGQ-AI/MiniMax-M3-Medium-JANG_2L"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "JANGQ-AI/MiniMax-M3-Medium-JANG_2L" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "JANGQ-AI/MiniMax-M3-Medium-JANG_2L", "messages": [ {"role": "user", "content": "Hello"} ] }'
MiniMax-M3 · REAP-32 · JANG_2L
⚠️ Requires vMLX ≥ v1.5.62
Earlier vMLX builds contain a runtime cache bug that causes repetition loops on long output. This is an engine issue, not a weights issue — update vMLX to v1.5.62 or later before running this model. On v1.5.62+ generation is clean.
A space-efficient MiniMax-M3 bundle for Apple Silicon: 32 % REAP expert pruning + JANG_2L mixed-precision quantization, ~105 GB, runs on a single 128 GB Mac via vMLX / MLX.
What this is
- Base: MiniMax-M3 (
model_type=minimax_m3_vl) — MoE, GQA-4, MSA Lightning Indexer, vision tower. - Pruning: REAP saliency pruning, 32 % of routed experts removed (87 of 128 kept per MoE layer), highest-saliency experts retained.
- Quantization (JANG_2L, affine, group size 64):
tensor bits routed experts gate_proj/up_proj2 routed experts down_proj3 shared experts 6 dense MLP (layers 0–2) 6 attention q/k/v/o 8 embeddings 6 lm_head 8 vision tower + projectors 8 norms, router gate, MSA indexer fp16 down_projis kept at 3-bit (the rest of the routed experts are 2-bit) for stable long-form coherency. The full per-module bit map is written intoconfig.json(quantization) and applied automatically by the loader.
Usage
Load in vMLX (v1.5.62+); the engine autodetects minimax_m3_vl and applies the correct
settings (native MSA cache, paged cache off, per-module quant map). Sampling defaults ship
in generation_config.json (temperature=1.0, top_p=0.95).
Attribution
- Quantization & packaging: Jinho Jang · eric@jangq.ai
- Base model © MiniMax, used under the MiniMax-M3 license.
- Downloads last month
- 332
2-bit