5-bit MiniMax M3 running locally on a single M3 Ultra 512GB via Unsloth!

#3
by danielhanchen - opened
Unsloth AI org

Hey guys, you can now run and train MiniMax M3 in Unsloth Studio. GitHub
Recommended inference settings are automatically set. Guide

Example of MiniMax M3 (5-bit GGUF) running in Unsloth Studio.

mini max m3 in unsloth studio
danielhanchen pinned discussion
danielhanchen changed discussion title from MiniMax M3 running locally on a single M3 Ultra 512GB via Unsloth! to 5-bit MiniMax M3 running locally on a single M3 Ultra 512GB via Unsloth!

I got this working on my 512GB M3 Ultra too - I was wondering though, do these GGUFs have the MTP heads / vision encoder / etc still intact (just not utilized) or will additional GGUF updates be needed in the future to re-enable those features once llama.cpp/etc pick them up? Thanks for all your work on this!

According to VLLM docs, they made it work with EAGLE3 speculative decoding:

https://vllm.ai/blog/2026-06-12-minimax-m3-vllm

Speculative decoding: EAGLE3 support with the draft model released at Inferact/MiniMax-M3-EAGLE3.

Not sure if the same can be done with llama.cpp or Unsloth Studio?

Sign up or log in to comment